OpenRouter: The AI gap between China and the US is only 3~6 months, open-source models are sweeping the world.

OpenRouter calls out multiple representatives: DeepSeek V4 Flash enters real agentic workflows at ultra-low prices, GLM 5.2 takes the quality throne as the number one rated by Artificial Analysis, and NVIDIA Nemotron 3 Ultra represents the fully open American team.
(Previous context: The countdown to the end of the high AI pricing era? Five structural reasons why tokens will inevitably drop in price)
(Background supplement: Anthropic completely blocks China! Both Chinese and overseas Chinese-funded companies are prohibited from using Claude, raising a national security line)

Table of Contents

Toggle

  • DeepSeek Crashes Prices to the Floor
  • GLM Takes the Quality Throne
  • Team USA: NVIDIA Nemotron 3 Ultra

Two years ago, the open-weight throne still belonged to Meta's Llama. Now, data from OpenRouter, the world's largest neutral LLM router, shows: Llama has completely fallen off the leaderboard. As of May 2026, Chinese open-weight models have consumed about 61% of the platform's tokens, with DeepSeek alone accounting for 17.6% in a single week.

Behind this reversal lies an underestimated fact: the intelligence and capability of open-weight models remain consistently three to six months behind US frontier labs, and this gap is not widening. For any organization that looks at their cloud bills, moving workloads from frontier models to open weights saves real money.

DeepSeek Crashes Prices to the Floor

DeepSeek V4 Flash is the first open-weight model that teams have directly thrown into real agentic workflows, using it as a substitute for Anthropic or OpenAI-level frontier models. The larger V4 Pro version scored 80.6% on SWE-bench Verified, the highest score for open weights (SWE-bench Verified, in simple terms, is a test set using real GitHub code repair tasks to evaluate whether a model can write code).

DeepSeek V4-Pro

  • Cache Miss Input Cost: $0.30 per million tokens
  • Cache Hit Input Cost: $0.03 per million tokens
  • Output Cost: $0.50 per million tokens

DeepSeek R1 (Deep Reasoning & Thinking Expert)

  • Cache Miss Input Cost: $0.55 per million tokens
  • Cache Hit Input Cost: $0.14 per million tokens
  • Output Cost: $2.19 per million tokens

DeepSeek V4-Flash (Ultra-Fast & Low-Cost Choice)

  • Cache Miss Input Cost: $0.14 per million tokens
  • Cache Hit Input Cost: $0.0028 per million tokens
  • Output Cost: $0.28 per million tokens

Cache hit means: when the same prompt or historical conversation is repeatedly input, the input cost is significantly reduced by 70% to 90%.

GLM Takes the Quality Throne

The open model GLM 5.2, released by z-ai in mid-June, was ranked first among open weights by third-party evaluation agency Artificial Analysis's Intelligence Index v4.1, scoring 51 points, ahead of Nemotron 3 Ultra's 48 points, MiniMax M3 and DeepSeek V4 Pro's 44 points, and only about 5 points behind the closed Claude Fable 5. On the more realistic agentic evaluation GDPval-AA, it is roughly on par with GPT-5.5.

Its strength is planning. For architecture design, whole-repo-scale refactoring, and long-running long-term agent tasks, GLM 5.2 is currently the most portable alternative closest to the Opus style. The cost is that it likes to think: OpenRouter's weighted average actual pricing is $0.447 per million tokens for input and $3.31 per million tokens for output.

What's more intriguing is the timing. A few days before GLM 5.2's debut, a US export control directive forced Anthropic to widely disable Fable 5 and Mythos 5 to prevent access by foreign nationals. On one side, closed models are cut off at any moment due to geopolitics; on the other side, MIT-licensed, near-frontier, self-hostable open weights.

Team USA: NVIDIA Nemotron 3 Ultra

Open weights are not only produced in China. NVIDIA recently released Nemotron 3 Ultra, which scored 48 points on the same leaderboard, making it the strongest US open weight, second only to GLM 5.2.

With 550 billion parameters and 55 billion active, it uses a hybrid Mamba-2 and Transformer architecture, under the OpenMDW license. In simple terms, OpenMDW means not only releasing weights but also open-sourcing training data, recipes, and evaluation tools. NVIDIA's logic is straightforward: the more open models are used, the more Blackwell chips, CUDA, and enterprise services that run these models are sold.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments