In May 2026, a report sparked widespread industry attention: a tech company had no cap on employee Claude usage licenses, resulting in a monthly token consumption equivalent to about $500 million. The extremity of this case is noteworthy not for its scale alone, but for exposing a logical disconnect in measurement: the company equated token consumption with employee AI usage intensity, yet failed to establish any control mechanism linked to business output.

If this “bill explosion” was merely an administrative oversight, then internal practices at Silicon Valley giants like Meta and Amazon reveal deeper issues. Meta launched an internal leaderboard called “Claudeonomics,” tracking over 85k employees’ token consumption, with titles like “Token Legend” and “Model Connoisseur” to incentivize rankings. Data shows that in 30 days, total consumption across all employees reached about 60 trillion tokens, estimated at roughly $9 billion based on Anthropic’s publicly available pricing; the top employee consumed about 85k tokens in a month, worth over $140,000. Amazon’s internal “KiroRank” aimed to promote AI application in business scenarios but led to employees deliberately executing meaningless operations to “boost scores,” ultimately criticized by senior vice presidents as “don’t use AI just for the sake of using AI.” Once token consumption became a technical metric, its internal use as a management evaluation scale has caused large-scale incentive misalignments.

This raises a core question: can token consumption serve as a basis for evaluating AI agents, companies, or employees? If not, what indicators truly hold evaluative value?

We believe using token consumption as an evaluation metric is dangerous because it confuses “cost” with “value,” and “process” with “output.” In an era where intelligent agents are becoming mainstream AI applications, the real asset is not tokens, but the agents themselves.

2. How Did Token Assetization Come About?

2.1 Commercial Maturity of Token Valuation

Tokens, as the smallest unit of processing text in large models, have been established as the fundamental pricing unit in the AI industry. In March 2026, the National Science and Technology Nomenclature Review Committee officially named the Chinese term for Token as “词元” (word element), releasing it for public trial, and the National Data Administration further defined it as the “settlement unit” in the AI era. According to disclosures from the National Data Administration, in Q1 2026, China’s daily token calls exceeded 140 trillion, over a thousandfold increase from early 2024. This standardization effort reflects a growing industry consensus on token-based pricing systems.

Looking at current pricing regimes, the token market is highly polarized. For international mainstream models, OpenAI’s GPT-4o charges $2.50 per million input tokens and $10 per million output tokens; Claude Sonnet 4.6 charges $3 for input and $15 for output. In April 2026, OpenAI officially released the GPT-5.5 series and the premium GPT-5.5 Pro, with API prices set at $30 for input and $180 for output per million tokens. Meanwhile, domestic large models engage in fierce price wars: on May 27, 2026, Xiaomi announced a permanent price cut of up to 99% for the MiMo-V2.5 series API, with cache hit input prices dropping to 0.025 yuan per million tokens; DeepSeek launched the V4 series, with the flagship V4-Pro’s discounted cache hit input price also as low as 0.025 yuan per million tokens. There is no unified “fair value” for token valuation—different models and scenarios can see the same token consumption correspond to prices differing by hundreds or even thousands of times.

2.2 Rise and Problems of Tokenmaxxing

The technical rationality and regulatory backing of token as a pricing unit, and the practice of companies turning it into a management scale, have a dangerous disconnect. “Tokenmaxxing” began to gain popularity internally around 2025, based on the management logic: since the company has purchased AI tools, employees should maximize their use to realize investment returns.

However, data reveals the fragility of this logic. Some studies estimate that for every $1 spent on AI token procurement, there may be implicit losses of about $0.50 to $0.80, including error correction, code rewriting, and review delays. Analysis shows that the top 10% of Claude Code heavy users consume about ten times more tokens than average developers, yet produce only about twice the output. Companies like Amazon and Meta have shut down their internal token leaderboards, and Uber exhausted its annual AI token budget within four months. The industry is shifting from “using AI more effectively” to a cautious phase questioning “whether the money spent is worth it.”

2.3 Emergence of the Intelligent Agent Economy

Meanwhile, discussions centered on token consumption almost entirely ignore a structural change: AI agents are evolving from “additional capabilities” of large models to independent technological and economic entities. In May 2026, the Cyberspace Administration, National Development and Reform Commission, and Ministry of Industry and Information Technology jointly issued the “Implementation Opinions on the Normative Application and Innovative Development of Intelligent Agents,” clarifying that intelligent agents are an important form of AI products and services. At the developer conference in May 2026, Anthropic repositioned Claude Code from “AI programming assistant in the terminal” to “asynchronous automation infrastructure for engineering teams,” and officially adopted a “hybrid pricing model” of “base fee + pay based on actual compute consumption.” Their Claude Managed Agents strategy signifies a deeper shift: model vendors are beginning to sell agent runtime infrastructure directly, transitioning from selling tokens to selling agent operational capabilities.

In this context, the limitations of token consumption as a measure become more apparent, prompting accelerated exploration of alternative evaluation systems.

3. Is Token a Suitable Metric?

3.1 Four Structural Flaws of Token as a Metric

First, token confuses cost with value. Baidu’s Robin Li explicitly stated at Create2026 AI Developer Conference: “Token only represents cost, not profit; it measures input, not output.” Tsinghua professor Ma Shaoping explained from a technical perspective: “Token itself does not carry intelligence; it’s just a carrier of information. The intelligence of AI agents lies in modeling the relationships within token sequences.” Using token consumption as a performance indicator is akin to a factory measuring output by electricity consumption—spending more on electricity doesn’t necessarily mean higher productivity; it could indicate inefficiency or poor management.

Second, token lacks a cross-model, cross-task measurement standard. Different large models compute tokens differently; for example, tokenization adjustments in Anthropic’s versions can cause significant variations in token counts for the same text. Tasks with similar business goals also require vastly different token amounts. More fundamentally, when token pricing already varies by over a hundredfold among model vendors, using it as a performance benchmark is logically inconsistent.

Third, token-based assessment causes incentive misalignment. When token consumption becomes part of performance evaluation, employees face a pseudo-goal: maximize token usage. Engineers no longer aim to complete tasks with minimal tokens but instead inflate task chains and add redundant reasoning steps to increase token counts—an economic behavior of “metric alienation,” clearly reflected in Meta and Amazon’s practices.

Fourth, token struggles to capture quality of completion. An intelligent agent that solves complex engineering problems accurately in one go may consume far fewer tokens than a low-quality agent that repeatedly tries, backtracks, and gradually approaches the answer. More token consumption often indicates lower efficiency—precisely the opposite of evaluation goals.

3.2 Redefining the Core Asset as the Intelligent Agent

This analysis points to a fundamental conclusion: tokens are resources consumed, while intelligent agents are the entities creating value. The relationship is akin to electricity consumption versus the work done by an electric motor—total power usage can be measured, but the real value lies in how much work the motor accomplishes and what products it produces.

Anthropic’s strategic development supports this view. The new Claude model released in May 2026 emphasizes “agentic coding, computer use, knowledge work, financial analysis”—real work scenarios where agents intervene. More importantly, Anthropic’s platform strategy for managed agents shifts from selling model invocation rights to providing agent runtime services. This change fundamentally moves the value carrier from underlying compute consumption to application-layer entities.

According to Claude Code’s lead, the current subscription-based pricing model is unprofitable. This indicates that relying solely on token pricing cannot cover the true costs of intelligent agents—their value depends on task completion, automation depth, and workflow embedding, which cannot be effectively captured by token metrics.

3.3 Emergence of New Evaluation Benchmarks

When token metrics prove insufficient, the industry is exploring alternatives. For task completion, SWE-bench Verified has become a recognized rigorous standard, requiring models to autonomously locate and fix bugs in real GitHub codebases. Public leaderboards show Claude Sonnet 4 scores about 80.20%, and models like Claude Opus 4.6 score around 78–80%. These benchmarks focus on “how many task units” an agent completes, not on token consumption.

For business value, Baidu proposes DAA (Daily Active Agents), defined as “how many agents are actively working and delivering results every day.” The focus shifts from “how much AI is used” to “how many tasks AI completes.”

Amazon is also exploring “normalized deployment” metrics to replace token tracking, emphasizing whether engineers can continuously generate valuable code via AI. The 2026 FinOps Foundation report shows 98% of surveyed companies manage AI costs, up from 31% two years earlier, with cost visibility becoming a top challenge. This trend indicates a shift from “whether there is expenditure” to “the relationship between expenditure and output,” reflecting a structural reevaluation of AI spending.

All these efforts share a common logic: measure the quality and quantity of tasks completed by intelligent agents, rather than resource consumption alone—supporting the proposition that “the real asset is not tokens, but the intelligent agent itself.”

4. Comparing Token Metrics and Price Wars

4.1 Token-Centric vs. Agent-Centric Approaches

The token measurement camp’s core stance traces back to Huang Renxun’s statement at GTC 2026: “If a $500k/year engineer can’t burn through $250k worth of tokens in a year, I’d be worried,” advocating for token budgets as productivity input indicators. The core assumption is that token consumption correlates positively with value creation.

However, this assumption faces multiple challenges. Uber COO Andrew Macdonald pointed out: “It’s difficult to directly link individual employee productivity improvements to overall business impact.” In practice, employees often use AI to handle “undesirable tasks” rather than “most valuable work for the company.” Financially, only 14% of CFOs report clear, measurable ROI from AI investments. Uber exhausted its annual token budget without corresponding performance gains. All evidence points to a common conclusion: there’s no verified causal link between token budgets and business growth; tokens should not serve as evaluation scales.

4.2 The Double-Edged Sword of Token Price Wars

Intense competition over token pricing adds a new dimension to the measurement debate. In April 2026, OpenAI’s GPT-5.5 Pro API priced input at $30 and output at $180 per million tokens, several times higher than GPT-5.4 Pro. Meanwhile, DeepSeek cut the V4-Pro’s discounted price to 0.025 yuan per million tokens, and Xiaomi’s MiMo-V2.5-Pro cache hit price also dropped to 0.025 yuan. The divergence in token prices across providers now exceeds any traditional commodity market’s price gradient. For the same infrastructure, input costs per million tokens can range from less than 0.03 yuan to about 210 yuan (roughly $30).

This dynamic fundamentally threatens the credibility of token as a measurement scale: if the same token consumption can cost hundreds or thousands of times more across vendors, how can token consumption be a reliable basis for cross-company AI performance comparison? For investors and analysts, basing performance forecasts on token consumption introduces increasing bias. The token-based measurement is rapidly fragmenting, and the “input scale” measured by consumption is losing its reference value.

5. Facts Speak Louder Than Words

Scenario 1: Meta’s “Claudeonomics” Failure

In April 2026, Meta employee developed an internal dashboard called “Claudeonomics,” tracking over 85k employees’ token usage. Data showed total consumption of about 60 trillion tokens in 30 days, roughly $600k at Anthropic’s publicly available prices. The top individual consumed about 2.81 billion tokens in a month, worth over $140,000.

This case vividly illustrates the three stages of token-driven incentive: first, using token volume to motivate AI tool adoption; second, employees actively seek or generate token-consuming tasks to maintain rankings; third, company resources are wasted on ineffective consumption, with output quality far below expectations. Meta eventually shut down the leaderboard.

Scenario 2: Anthropic’s Computing Bottleneck and Expansion

Another aspect of token measurement is the cost and compute pressure on model vendors. In May 2026, to alleviate Claude’s capacity limits, Anthropic announced it would take over all computing at SpaceX’s Colossus 1 data center, gaining over 300 MW of additional capacity and more than 220k Nvidia GPUs. The agreement specifies that this new capacity will directly enhance service for Claude Pro and Claude Max subscribers. This expansion reveals the high dependence of token-based valuation on compute supply, and also indicates that the long-term stability of token valuation remains uncertain.

Scenario 3: Widespread Corporate Token Bills Pressure

Reports indicate Microsoft temporarily restricted employee use of Claude Code. Uber exhausted its annual AI token budget within four months. Companies like Shopify, Spotify, ServiceNow, Roku have all mentioned AI as a major operational expense in earnings calls. When token bills rapidly inflate and begin affecting quarterly financials, companies start systematically examining the relationship between token consumption and business output.

Scenario 4: Positive Cases of Agents as Assets

While token-based narratives face challenges, some companies focusing on evaluating agents themselves are showing different paths. Anthropic’s enterprise service strategy has been notably successful: although its active user base on the consumer side is less than 2% of ChatGPT’s, its annual revenue has been steadily approaching OpenAI’s. Media reports indicate that Anthropic’s annual revenue was about $281B at the end of 2025, surpassing $30 billion by March 2026, overtaking OpenAI’s $25 billion level. One key reason is that Anthropic’s agents perform real tasks like contract processing, data analysis, and supply chain scheduling—users may not see the agents, but they generate steady value daily.

According to The Information and other media, Claude Code’s annual revenue has grown rapidly from 2025 to early 2026. Companies pay for the quality of task completion, not just compute consumption—this strongly supports the proposition that “the true asset is the intelligent agent.”

6. From Token Asset-Centric to Agent Asset-Centric

In summary, the trends are becoming increasingly clear.

First, token consumption as an efficiency metric has fundamental flaws. It confuses input with output, cost with value; it lacks cross-model, cross-scenario measurement standards; it separates evaluation from operational goals, leading to severe incentive misalignments. Internal practices at Meta and Amazon confirm this.

Second, intelligent agents are becoming the most substantive value carriers in the AI economy. Their defining feature is “completing task units,” not “consuming compute units.” A highly efficient agent may use very few tokens to accomplish complex tasks; a low-efficiency agent may consume大量 tokens without solving real problems. Therefore, token consumption neither reflects an agent’s capability boundary nor predicts ROI.

Third, the industry is shifting from a token-centered to an agent-centered evaluation system. Benchmarks like SWE-bench for task completion provide a framework for cross-agent comparison; business metrics like DAA (Daily Active Agents) aim to measure economic contribution at the agent operation level; companies are exploring performance indicators based on output quality.

In conclusion, the true asset is not tokens, but the agent itself. Tokens are the fuel for agent operation, but a company’s competitiveness depends on engine efficiency, not tank capacity. The transition from token-centric to agent-centric measurement paradigms will be one of the main themes in the AI industry’s evaluation system overhaul over the next three to five years.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
SummerCreationCamp
1.47M Popularity
#
EventContractsLaunch
133.19K Popularity
#
BrentReturnsTo100
1.85M Popularity
#
IntelQ2RevenueSurges25%
408.16K Popularity
#
UStoImpose10To12.5PercentTariffsOn60Economies
11.81M Popularity

Pinned

Sitemap

Tokens are not real assets; intelligent agents are.

1. Massive Token Consumption by Tech Company Employees