Vercel: DeepSeek's token usage exceeds OpenAI, accounting for only 1% of total expenses

robot
Abstract generation in progress

ME AI Message, according to Beating Monitoring, Vercel released the AI Gateway Production Index for June 2026. The report shows that, driven by the launch in May of the DeepSeek V4 series (including the Flash and Pro models) on Vercel Gateway, DeepSeek’s token traffic share surged from less than 1% to 17% within a single month, surpassing OpenAI (13%) to take third place. However, due to extremely low pricing, the total cost for all users using DeepSeek accounts for only about 1% of the gateway’s total funding expenditure. Pricing is the main reason behind DeepSeek’s rapid breakout.

DeepSeek V4 Flash charges only $0.14 for million-token input and $0.28 for output, which is 20 to 50 times cheaper than comparable leading frontier models from Anthropic, and also 8 to 12 times cheaper than Qwen 3.6 Plus and Kimi K2.6. Evaluations indicate that DeepSeek V4 meets performance requirements, prompting the development team to deploy it quickly in production.

Despite the traffic boom for low-cost models, frontier models still dominate funding consumption. In May, Anthropic’s spending share rose from 61% to 65%, accounting for 70% to 80% of spending in high-difficulty scenarios such as application generation, back-end intelligent agents, and programming. For example, in the programming intelligent agent scenario, DeepSeek contributed 49% of token traffic but only accounted for 4% of costs, while Anthropic used 28% of traffic yet consumed 70% of the funds.

The development team is managing budgets through intelligent routing—diverting high-frequency, low-risk tasks to low-cost models and using frontier models only at critical points. Considerations of return on investment (ROI) have also slowed model upgrades. For instance, Google’s Gemini 3.5 Flash launched in May with a higher price than version 3.0, resulting in slow migration; by the end of the month, 3.0 Flash still accounted for 90% of traffic in the Flash series, while 3.5 Flash accounted for only 7%. Meanwhile, AI intelligent agents show extremely high token consumption density, with more than half of the tokens consumed by just a quarter of the requests. (Source: BlockBeats)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned