OpenRouter launches Fusion API: three-model hybrid approximation Fable 5, costing only half as much

OpenRouter officially launched the Fusion API on June 13, allowing developers to call multiple models in parallel through a single API call, then have a Judge model fuse the outputs to produce the best answer. In the DRACO deep research benchmark test, Fusion scored 69%, surpassing Claude Fable 5's 65.3%, while a low-cost panel composed of Gemini 3 Flash, Kimi K2.6, and DeepSeek V4 Pro was less than 1% behind, yet at half the cost.
(Background: Google-backed AI routing platform OpenRouter, valued at $1.3 billion, grew 240% in one year)
(Additional context: OpenRouter analyzed a 100 trillion token research report: What do humans really do with AI, the rise of Chinese models, and user retention secrets)

Table of Contents

Toggle

  • DRACO Benchmark Testing: Fusion Fully Surpasses Single Models
  • Budget Panels Can Compete: Three-Model Fusion Only Slightly Behind Fable by Less Than 1%
  • Fusion Is Not a Replacement for Fable, But Its Use Cases Are Clear
  • Four Call Methods Explained at a Glance

The popular AI routing platform OpenRouter officially released the Fusion API on June 13. This new feature allows developers to send the same prompt to multiple models simultaneously, then have a Judge model analyze and fuse all responses to produce the final answer, all with a single API call.

The core mechanism of Fusion is quite straightforward: when a user sends a prompt, OpenRouter parallelizes the request to several models within a "panel" (each equipped with web search and web fetch tools). The Judge model then reviews all panel responses, produces a structured analysis including consensus points, contradictions, partial overlaps, unique insights, and blind spots. Finally, the calling model writes the final answer based on this analysis. The entire pipeline runs on the server side, providing an experience identical to calling a single model.

DRACO Benchmark Testing: Fusion Fully Surpasses Single Models

The OpenRouter team evaluated using Perplexity AI’s DRACO deep research benchmark, which covers 100 complex research tasks across 10 fields. The scoring criteria include factual correctness (about 20 items), breadth and depth (about 9 items), presentation quality (about 6 items), and citation quality (about 5 items), with a negative weighting mechanism that penalizes models for providing incorrect information.

The scores for various configurations are as follows:

  • Fusion (Fable 5 + GPT-5.5 → Opus 4.8 Fusion): 69.0% 🥇
  • Fusion (Opus 4.8 + GPT-5.5 + Gemini 3.1 Pro → Opus 4.8 Fusion): 68.3%
  • Fusion (Opus 4.8 + GPT-5.5 → Opus 4.8 Fusion): 67.6%
  • Fusion (Opus 4.8 fused with itself): 65.5%
  • Claude Fable 5 single model: 65.3% (only completed 93/100 questions due to content filtering)
  • Fusion (Gemini 3 Flash + Kimi K2.6 + DeepSeek V4 Pro → Opus 4.8 Fusion): 64.7% 🔥
  • DeepSeek V4 Pro single model: 60.3%
  • GPT-5.5 single model: 60.0%
  • Claude Opus 4.8 single model: 58.8%

Budget panels can also compete: three-model fusion only slightly behind Fable by less than 1%

The most surprising result comes from a "budget panel" consisting of Gemini 3 Flash, Kimi K2.6, and DeepSeek V4 Pro—relatively inexpensive models. After fusion, they scored 64.7%, beating GPT-5.5 (60.0%) and Opus 4.8 (58.8%), and coming within less than 1% of Claude Fable 5, while costing only half as much. This means developers can achieve research-level depth with lower inference costs.

Another noteworthy finding is that "fusing itself" also works. When Opus 4.8, as two members of the panel (two copies of the same model), is fused with Opus 4.8 as the Judge, it scores 65.5%, outperforming the single Opus 4.8 (58.8%) by 6.7 points. This demonstrates that much of Fusion’s performance gain comes from the synthesis step itself—running the same model twice, with different reasoning paths, tools, and sources, can yield significant improvements.

Fusion Is Not a Replacement for Fable, But Its Use Cases Are Clear

OpenRouter CEO Alex Atallah stated on X that Fusion can achieve "Fable-level intelligence at half the price." However, the team also admits that the DRACO benchmark does not include long-horizon tasks, which are Claude Fable 5’s true strength. For complex tasks requiring multi-step reasoning and long context, Fable remains irreplaceable in the short term.

Regarding software development scenarios, Fusion is not designed to directly replace programming models. OpenRouter has designed Fusion as a server tool: when foundational models encounter questions requiring deep research (such as architecture decisions or best practices), it can automatically decide whether to call Fusion for multi-angle analysis. Routine coding tasks continue to be handled by the main model.

Four Call Methods at a Glance

Developers can use Fusion in four ways:

  • Chatroom trial: Visit openrouter.ai/fusion, select a preset or build your own panel
  • Model slug: In API, specify "model": "openrouter/fusion" to automatically include the default frontier panel
  • Server tool: Add {"type": "openrouter:fusion"} to the tools array, allowing the main model to decide when to call
  • Plugin mode: Include the plugins parameter in API calls to customize panel model combinations

The default panel call cost for Fusion is about 50% lower than Fable, but response times are roughly 2-3 times longer (due to waiting for multiple models to run in parallel and then fuse). OpenRouter said it will continue improving performance based on user feedback.

This article is sourced from OpenRouter Blog, compiled and organized by Dynamic Trends.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned