This is a mid-range model, the "hardest working" in the Sonnet series. It scored 63.2 on the agent capability test SWE-bench Pro—just 6 points shy of the flagship Opus 4.8's 69.2. On another dimension, the graduate-level reasoning test GPQA-AAA v2, Sonnet 5 actually edged out Opus 4.8.

Pricing is more critical. During the promotional period, input tokens cost $2 per million, and output tokens $10. Opus 4.8's corresponding prices are $5 and $25—Sonnet 5 delivers over 90% of flagship capability at 40% to 60% of the price.

This news can be read in two ways.

First: AI is getting cheaper again. Falling costs benefit everyone, the Chatbot war continues, and model makers are fiercely competing.

Second—and what the market is actually pricing in—the cheaper models become, the more expensive computing power and storage get.

On the day Claude Sonnet 5 launched, the US semiconductor index rose nearly 4%. There has been a clear narrative in AI over the past three years: inference efficiency will kill chip demand. But this judgment has been wrong at every data point.

Price Cuts: 1,000 Times in Three Years

Let's look at the price reduction trend first.

In 2022, the API call cost for GPT-4-level models was about $0.03 per thousand tokens. By 2025, the price for equivalent performance-level models—according to the Stanford AI Index Report—had fallen about 280 times. Combined with the effects of open source and efficiency improvements, the industry-wide consensus is a 1,000-fold reduction.

It's not just one model; every company is cutting prices.

This time, Anthropic's Sonnet 5 matches the capability density of Opus 4.8, pricing at only 40% to 60%. Google's Gemini Omni Flash generates video at $0.10 per second, and the Nano Banana 2 Lite image model outputs 4-second images at just $0.034 per thousand—half the cost of the previous generation. DeepSeek-V4-Pro has driven the price of one million input tokens down to $0.035.

Price cuts aren't just on the pricing sheet.

On June 24, The Information reported that OpenAI internally found a pure software optimization technique—GPU demand for a certain computing operation was cut by more than half, with the dedicated GPU pool dropping from thousands to just hundreds. That same month, Meta proposed the Vistara plan: reconnecting DDR4 memory from retired servers via self-developed CXL chips, pairing it with DDR5 in a 3:1 ratio, reducing inference server costs by 25%.

By June 30, Jieyue open-sourced the speculative decoding technology JetSpec—speeding up large model inference by nearly 10 times. Translated, the same token output would require an order of magnitude fewer GPUs.

If AI were a traditional cost-demand function, these signals should point to one thing: fewer chips will be needed in the future.

Wall Street was fearful of this.

Over the weekend of January's DeepSeek R1 launch, AI infrastructure stocks experienced the sharpest sell-off in recent years. AI cloud company Nebius's stock plummeted 40%. The story was simple: Chinese open-source models sell tokens for $0.1, while US companies charge $2—computing demand must collapse.

Explosion: Total Spending Surges 320%

But what actually happened was the complete opposite.

Nebius co-founder Roman Chernin later recalled that the week DeepSeek triggered panic "might have been our best sales week ever." Company procurement departments, seeing the sudden cost drop, did not cut budgets—they finally had the green light to run inference at scale.

In 2024, global enterprise spending on generative AI totaled about $11.5 billion. In 2025, that number surged to $37 billion—a 320% increase in one year. According to Menlo Ventures' enterprise survey, the median company was running "dozens" of AI applications in 2025, compared to just 1 or 2 in 2023.

Data across every dimension follows the same curve:

Uber had already exhausted its full-year AI budget by April 2026. AT&T now processes 27 billion tokens daily—18 months ago, that number was 800 million. A large US health insurance company saw its monthly token consumption leap from 3 million to over 150 million.

Breaking it down, growth comes from three overlapping directions.

First is application diffusion. Each company's marketing department uses 3 AI tools, sales 4, customer service 2, plus legal, HR, finance—going from 2 to dozens is an order of magnitude jump.

Second is depth per application. Take customer service AI: in 2023, daily interactions were about 500, each about 800 tokens, ending after the conversation. By 2025, daily interactions reached 15,000, each about 4,500 tokens, and each interaction triggers 3 to 5 subsequent inference tasks—sentiment analysis, escalation prediction, quality scoring—all stacked on the same entry point.

Third is the complexity upgrade of the models themselves. From 7B-parameter single-turn models to 70B+ multi-step reasoning agents, each internal inference round consumes tens to hundreds of times more tokens than linear interactions.

In other words, token costs fell to 1/1,000th, while the market consumed tens of thousands of times more tokens. The net effect in only one direction: spending exploded.

Token consumption doubles every two months—multiple independent data points converge on the same number. Extend this exponential curve to 2027, and enterprise AI annual spending exceeding $7B is an arithmetic problem, not a prediction.

Transmission: Storage Up Sixfold, Chip Infrastructure Towards $7.6 Trillion

The demand stimulated by price cuts hasn't stopped at the software layer.

The price increase in memory is the most direct signal of AI demand transmitting from the model layer to the hardware layer.

Starting in Q3 2025, spot prices for DRAM and NAND Flash have both risen over 300%. DDR5 modules saw monthly increases exceeding 90% at one point. Entering 2026, the price hikes haven't stopped—they've accelerated.

In Q1 2026, DRAM contract price increases were revised from an expected 55%-60% to 90%-95%; NAND from 33%-38% to 55%-60%. For Q2, TrendForce forecasts DRAM to rise another 58%-63%, and NAND another 70%-75%.

Using consumer-grade products as a benchmark: Acer Predator 32GB DDR5 6000 dual-channel kit was priced at around 1,300 yuan at the end of October 2025, but by January 2026, it had soared to 2,700 yuan. Doubling in three months is extremely rare in consumer markets.

Samsung's memory business recorded its highest quarterly operating profit ever in Q4 2025—exceeding 20 trillion won, or about 96.2 billion RMB. The fundamental driver of this price surge over the past year is not consumer upgrades from phones or PCs, but massive AI data center procurement of HBM, enterprise SSDs, and high-density DRAM.

A Goldman Sachs report in May took this calculation to the extreme.

The report predicts that global AI infrastructure cumulative capital expenditure from 2026 to 2031 will be about $7.6 trillion. $765 billion in 2026 alone, climbing to $1.6 trillion by 2031. With a single baseline GPU (based on NVIDIA VR200 Rubin) priced at $80.5k, NVIDIA accounts for 75% of total computing spending in each period.

Goldman Sachs also asked a key question in the report: If ASICs (application-specific chips) massively replace GPUs, can it reduce total demand?

The answer depends on the situation. If demand is inelastic—enterprise AI computing demand is fixed—ASIC substitution can directly reduce total capital requirements. But if demand is elastic—cheaper computing means buying more—changes in chip mix mainly reshape profit distribution among different suppliers, not the total spending scale.

Goldman Sachs's base scenario chose the latter.

US stock prices are also heading in the same direction. SanDisk has risen 857% since the start of the year; Bernstein raised its target price to $3,000 in a June 30 report. AMD surged 7% in a single day to an all-time high. GPU makers, memory makers, packaging companies, data center equipment providers—all near new highs.

The most striking figure cited in an Edgen.tech summary article on June 11: memory chip prices have increased sixfold over the past year.

The "cyclical recovery" label doesn't fit. When something goes up sixfold, it reflects the entire economy's demand repricing of AI's physical infrastructure.

Root Cause: Jevons Already Answered This in 1865

William Stanley Jevons wrote a book in 1865 called The Coal Question.

His core observation was: after Watt improved the steam engine, coal consumption per unit dropped sharply, but Britain's total coal consumption actually rose. Because efficiency gains made steam power cost-effective in more industries—textiles, railways, mining, shipping—each new scenario created demand for coal that didn't exist before.

160 years later, the same formula is playing out with AI computing.

Companies did the math. At 2022 token prices, real-time inference for customer service conversations was economically infeasible. Non-urgent scenarios weren't worth running AI. Personalized content generation could only be done at the segment level, not the user level. By 2025, with prices 1,000 times lower, all these "non-existent demands" became urgent needs.

Nebius's Chernin gave the most direct summary: "Every time we make the same unit of intelligence cheaper, we don't reduce consumption—we increase it—because the same budget can now solve more complex tasks."

The market overlooked another structural driver: the positive feedback loop of gross margins.

The gross margin curve for AI inference has no historical parallel. A company offering an API might start with only 10% gross margins—model training is expensive, inference is expensive. But software optimizations (operator fusion, quantization, speculative decoding) cut inference costs every month, while pricing adjustments always lag. So gross margins climb from 10% to 90% faster than any traditional industry.

Gross margins drive profits, profits drive more procurement, procurement spreads costs—a positive feedback loop with no ceiling.

"If you have DRAM, you can sell tokens. If you don't have DRAM, you can't sell tokens." This phrase is becoming the fundamental equation of AI chip demand.

Two sensitivity assumptions in the Goldman Sachs report reinforce the same judgment. If chip economic life shrinks from 5 years to 3 years, replacement cycles accelerate, and cumulative capital needs jump. If memory per chip is 25% higher than expected—mainly changing the spending distribution within the chip stack—the net impact on the $7.6 trillion total is limited, but the direction is the same: spending won't decrease.

Endgame: Who Holds the Computing Power?

The lifting of Fable 5 export controls—banned on June 12, lifted on June 30, just three weeks—provided an unexpected footnote to this paradox.

The rationale for controls was "national security risk." Lifting the controls has nothing to do with the risk disappearing—substitutes emerged. Asian teams like Tulongfeng released models close to Mythos-level during the control period, instantly nullifying the blockade's deterrent. The unban was a matter of reality, not goodwill.

This episode fits perfectly into the AI cost-reduction paradox: models are substitutable. From GPT to Claude to DeepSeek to open-source models, no one can monopolize AI capability itself—if someone puts up a barrier, someone finds a way around.

Hardware doesn't follow this logic.

GPUs don't. DRAM doesn't. Fab construction cycles are measured in years. Lithography machine production capacity is fixed. High-purity silicon supply elasticity is near zero. These are laws of physics, not business strategies. Software optimizations can reduce model costs a thousandfold, but they can't shave a single day off a fab construction cycle.

If this paradox continues, the endgame of AI model price cuts does not point to decoupling from computing power—it points to a re-concentration of computing power pricing power. No matter whose model you use, tokens have to run on someone's chips. Every dollar that model makers slash in price ends up as revenue on the books of data centers, fabs, and memory production lines. The more aggressive the cost cuts, the more irreversible this shift.

Risk Warning and Disclaimer

        Market risk exists; investment requires caution. This article does not constitute personal investment advice, nor does it consider the specific investment objectives, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article are suitable for their specific circumstances. Investment based on this content is at your own risk.

DRAM-9.86%

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
GateStocksTransferLive
143.56K Popularity
#
StrategyBuyback
1.36M Popularity
#
PredictWorldCup🏴󠁧󠁢󠁥󠁮󠁧󠁿vs🇨🇩
546.99K Popularity
#
TrumpDisclosesOver100MBTCETH
3.83M Popularity
#
SharplinkAdds10000ETH
55.49M Popularity

Pinned

Sitemap

The cheaper AI gets, the more expensive chips become.

Price Cuts: 1,000 Times in Three Years

Explosion: Total Spending Surges 320%

Transmission: Storage Up Sixfold, Chip Infrastructure Towards $7.6 Trillion

Root Cause: Jevons Already Answered This in 1865

Endgame: Who Holds the Computing Power?

Trending Topics

GateStocksTransferLive

StrategyBuyback

PredictWorldCup🏴󠁧󠁢󠁥󠁮󠁧󠁿vs🇨🇩

TrumpDisclosesOver100MBTCETH

SharplinkAdds10000ETH

Pinned