Token as Productive Capacity: Large Model Price War Begins

robot
Abstract generation in progress

Securities Daily Reporter Yuan Chuanxi

Recently, the surge in AI intelligent agents has swept across various industries. AI agents are integrating into daily work and life scenes at an unprecedented speed.

Behind this wave is an exponential increase in computing power demand—large-scale deployment of personal AI agents has led to massive Token (digital identifiers in computing) consumption, quickly breaking through the cost barriers of major model providers.

Recently, domestic AI companies such as Beijing Zhipu Huazhang Technology Co., Ltd. (hereinafter referred to as “Zhipu”) and Tencent Cloud have issued notices of price increases for AI computing power products, with some products increasing by over 400%. This strategic shift from “burning money for growth” to “raising prices to boost volume” not only marks the end of industry wild growth but also reflects a profound change in the supply and demand relationship of computing power in the AI agent era.

Reconstruction of Large Model Pricing Systems

The pricing system for large models is undergoing a systematic overhaul, with domestic providers accelerating price hikes. This phenomenon contrasts sharply with the price wars two years ago.

In May 2024, ByteDance took the lead in initiating a price war, pricing its Doubao Pro model at 0.0008 yuan per 1,000 Tokens, which is 99.3% lower than the industry average. Subsequently, Alibaba Cloud’s Tongyi Qianwen main models reduced prices by 97%, Baidu’s Wenxin large models became completely free, and Tencent’s Hunyuan large model saw a maximum price drop of 87.5%. For a time, the industry was caught in a price-cutting frenzy.

“At that time, the logic was simple: let developers use it first; market share was more important than anything else,” a three-year AI product manager told Securities Daily. In 2024, an internal target at a leading company was set to not consider profitability for three years, with product prices even below the cost of computing power.

However, the marginal effects of low-price strategies quickly diminished. Industry analysts told Securities Daily that while the price war from 2024 to 2025 accelerated the market adoption of AI large models, it also led to a widespread “high investment, low return” dilemma. As model invocation volumes soared from hundreds of billions to trillions, computing costs increased exponentially, making reliance on capital infusions unsustainable. From the second half of 2025, some small and medium-sized companies began quietly reducing free quotas.

“This is not simply a price increase but an inevitable result of changes in cost structure,” an executive from a leading cloud provider analyzed to Securities Daily. “In the past, the industry used losses to gain market share; by 2026, sustainable operation must be considered.”

Token Inflation

To understand the collective price hikes of domestic large models, one must first grasp the concept of “Token inflation.”

Tokens are the smallest units of text processing in large models, akin to a measure of AI workload. When the industry talks about Token inflation, it refers to a surge in the complexity of AI tasks, leading to higher consumption of computing resources for the same service. It’s like switching from lighting a small lamp to powering a factory—electricity costs naturally rise.

This “inflation” pressure primarily stems from explosive overseas market demand. In February 2026, OpenRouter (a leading global API distribution platform for large models) data showed that the total Token consumption of the top ten AI models worldwide exceeded 27 trillion that month, with Chinese large models contributing 14 trillion, accounting for over 50%.

“This indicates that domestic large models are shifting from domestic demand-driven to global export,” said Zhang Yi, CEO of Guangzhou iMedia Data Intelligence Consulting Co., Ltd., in an interview with Securities Daily. “Overseas users’ usage habits are very different from domestic ones.” European and American developers prefer embedding large models into production workflows, often involving multiple tool calls, long context retrieval, and code generation per request. “A single API call in overseas scenarios can consume three to five times more Tokens than in China.”

If overseas markets are the external factor, then the large-scale deployment of AI intelligent agents is the internal driver pushing up computing costs.

Unlike early chatbot Q&A, AI agents possess a closed-loop capability of “perception–decision–execution,” enabling autonomous completion of complex tasks. For example, in financial risk control scenarios: a single AI agent completing a loan approval involves four stages—user profile retrieval (long context), credit data call (tool use), risk assessment calculation (reasoning chain), and report generation (output)—with total Token consumption reaching hundreds of thousands.

Multiple factors combined produce astonishing data. Guolian Minsheng Securities estimated that China’s overall daily Token consumption surged from 100 billion in early 2024 to 180 trillion in February 2026. As AI agents evolve toward multimodal and multi-agent collaboration, this number continues to accelerate.

The reversal of supply and demand ultimately influences pricing. Since 2025, global AI computing infrastructure has faced capacity shortages, with server procurement costs rising sharply year-over-year due to tight supplies of HBM (high-bandwidth memory, critical for AI training) and advanced-process GPUs (graphics processing units).

For example, on March 17, Alibaba Cloud announced that due to exploding global AI demand and supply chain price increases, prices for AI computing and storage products rose by up to 34%.

As large model providers shift from “water sellers” to “water drinkers,” price hikes have become a rigid choice to maintain service quality. Zhipu AI explicitly stated in its price adjustment notice: “The rapid increase in user scale and invocation volume requires us to increase investment in computing power.”

Reconstruction of Business Models

The price hikes not only address cost gaps but also signify a deep restructuring of the entire industry’s business logic.

“After the price war ends, the real value war begins,” said the aforementioned cloud executive. They believe 2026 will be the year of large-scale AI commercialization, with industry competition shifting from simply owning computing power to providing efficient, stable, and low-cost model services and AI applications.

Currently, the large model industry is shifting from “traffic subsidies” to “value filtering.” Early low-price strategies attracted many trial-and-error users, leading to inefficient use of computing resources. One company estimated that 40% of its free quota was used for testing without actual business scenarios. Through moderate price increases, companies can filter out non-essential demand and ensure service stability for high-quality clients. The significant price hikes by Zhipu, Tencent Cloud, and others are actually balancing corporate customers’ willingness to pay and ROI (return on investment). This “raising prices to boost volume” refined operation marks China’s large model industry’s transition from internet-style scale expansion to software industry’s value-based pricing.

Pan Helin, a member of the Information and Communications Economic Expert Committee of the Ministry of Industry and Information Technology, told Securities Daily that price increases will not suppress genuine demand but will accelerate the “good money driving out the bad.” Corporate clients’ high requirements for stability and compliance give them a willingness to pay and a higher lifetime value, providing confidence for large model providers to shift from “traffic thinking” to “value pricing.”

This transformation is reshaping the entire industry chain’s利益格局. Upstream computing power providers (like NVIDIA) continue to benefit; midstream cloud providers (like Alibaba Cloud and Tencent Cloud) seek a balance between selling models and selling computing power—aiming to attract customers with AI services while avoiding being overwhelmed by high hardware costs; downstream application layers show clear differentiation: large companies with R&D capabilities (such as ByteDance and Baidu) can flexibly allocate computing resources internally to hedge against price increases, while small startups relying solely on API calls face soaring costs and potential shutdowns.

Enterprise-level large model providers are also beginning to focus on the deep changes in Token economics. Yang Lei, co-founder and executive director of DeepTech Co., Ltd., told Securities Daily, “In the future, Token will represent capacity. As Skill-based Models reshape industries like software development, data analysis, and customer service outsourcing, traditional per-person, per-day pricing will be replaced by ‘Token consumption’ pricing. This is not just a change in measurement units but a leap in productivity paradigm.”

Zhang Yi noted that from a global competitive perspective, Token inflation is also a byproduct of domestic model technological advancement. Price increases are not the end but the beginning of a new efficiency revolution. Those who can continuously optimize cost structures in this arms race of computing power will secure a position on the global AI agent stage.

Looking back at the 2024 price war and today’s collective price hikes, China’s large model industry is undergoing a painful rite of passage. The era of relying on ultra-low prices for attention has ended. A new era of winning through technological efficiency, customer value, and ecological closed loops is gradually unfolding amid the Token economy’s torrent.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin