SemiAnalysis: From infrastructure to the model layer, the wealth migration along the AI value chain is accelerating.

The value center of gravity in the AI industry is undergoing a structural shift.

Over the past two years, Nvidia, memory manufacturers, and energy suppliers have dominated the distribution of AI investment returns. However, as the commercialization of Agentic AI accelerates, the profit margins at the model layer are expanding at an unprecedented pace, while Nvidia and TSMC, which control the computing power supply side, have not yet fully reflected this trend in their pricing.

Anthropic is the most direct footnote to this shift.

According to the latest research from SemiAnalysis, Anthropic's annualized recurring revenue (ARR) has surged from $9 billion at the beginning of the year to over $44 billion, and its gross margin on inference infrastructure has jumped from 38% to over 70% in the same period. At the same time, token production costs have been significantly compressed due to hardware iterations and software optimizations, widening the gap between value and cost, pushing model providers into a new phase of rapidly rising profit margins.

On the supply side, Nvidia and TSMC possess the most scarce resources but have not yet fully responded with pricing to the current demand boom. SemiAnalysis believes that this pricing lag constitutes a significant market dislocation: next-generation systems represented by Vera Rubin (VR NVL72) have substantial room for price increases, and whoever gains the first-mover advantage in this value redistribution will profoundly impact the investment logic across the AI industry chain.

Three-Year Migration Path of AI Value Pools

Between 2023 and 2025, excess returns from AI investments were mainly concentrated in the infrastructure layer.

Nvidia first released a blockbuster earnings report in May 2023, surging 25% in after-hours trading on a single day, officially kickstarting the AI investment wave. In 2024, Vistra and GE Vernova rose 265% and 146% respectively, becoming the best-performing stocks in the S&P 500, with energy bottlenecks becoming a market focus. In 2025, the memory sector took the lead, with SanDisk, Western Digital, Seagate, and Micron all recording annual gains of over 200%, and the imbalance in storage supply and demand became a core variable driving pricing.

Meanwhile, gross margins of model providers and inference service providers were under long-term pressure. At that time, critics regarded the actual utility of AI as merely "a better Google search" plus a chat interface, severely deviating from the trillions of dollars in expected capital expenditure.

This pattern underwent a fundamental change at the end of 2025.

Agentic AI: The Inflection Point Reshaping Tokenomics

SemiAnalysis regards December 2025 as the true inflection point for AI commercialization—Agentic AI began operating stably and was deployed at scale in enterprise workflows. The core significance of this change is that it fundamentally altered the economic value of tokens.

Taking SemiAnalysis as an example, its annualized token spending has already reached approximately 30% of total employee compensation, with each employee consuming over 5 billion tokens per month, more than five times the per capita level at Meta. The research team cited multiple real-world cases: tasks such as financial modeling, chart creation, and earnings analysis that previously required junior analysts hours to complete can now be done by agents at extremely low token costs, whereas the equivalent labor cost used to be hundreds to thousands of dollars.

At the same time, the cost of producing tokens is plummeting. SemiAnalysis estimates that in agent task scenarios, the actual blended price for running Opus 4.7 is about $0.99 per million tokens, far below the official list price of $5/$25—because agent workloads have a very high input-to-output ratio (approximately 300:1) and over 90% cache hit rate, causing a large number of tokens to fall into the lowest price tier.

Hardware acceleration is also significant. Compared to the H100 from a year ago, the Blackwell series can generate about 30 times more tokens per second on frontier workloads. Further comparisons show that the optimally configured GB300 NVL72 achieves about 17 times the throughput of the optimally configured H100 at FP8 precision, and when switching to FP4, this gap expands to 32 times, while the total cost of ownership (TCO) is only about 70% higher.

The two-way scissors gap between value and cost is precisely the core driver behind Anthropic's gross margin leaping from 38% to over 70%.

Model Layer Pricing Power: Why It Won't Be Eroded by Competition

Facing the rapid expansion of model providers' profit margins, the most common market skepticism is that competition will eventually drive prices down. SemiAnalysis holds a different view and provides two supporting points.

First, the pricing power of frontier closed-source models remains solid. Although open-source models continue to break benchmarks, their performance in real knowledge work scenarios is still significantly weaker than closed-source frontier models. Taking Kimi K2.6 (priced at $0.95/$4) as an example, its downward pressure on Anthropic Opus pricing is very limited.

Second, computing power constraints mean that no single frontier lab can independently meet the entire market's demand. Anthropic has begun actively managing demand by locking Claude Code behind a monthly subscription threshold of over $100 and restricting third-party access. Token demand will continue to exceed supply for the foreseeable future. This structural scarcity gives frontier model providers the confidence to price based on value rather than cost.

Anthropic has already realized this logic through its product line strategy: Opus fast is priced at 6 times the regular Opus, the upcoming Mythos is priced at $25/$125, 5 times the regular Opus, and top enterprise customers are still willing to pay for these high-priced SKUs. SemiAnalysis states that if Anthropic prices Mythos fast at $150/$750, it would itself be a paying customer.

Nvidia and TSMC: Pricing Lag for Scarce Resources

However, the two companies that control the most core scarce resources—Nvidia and TSMC—have not fully kept pace with this wave of value reassessment.

TSMC's N3 advanced process capacity has become the tightest bottleneck in the entire AI computing power expansion. Nvidia, Broadcom, Annapurna, MediaTek, and AMD are all competing for limited N3 wafer allocations, and N3 capacity utilization is expected to exceed 100% in the second half of 2026. DRAM fab utilization has already exceeded 90%, with overall memory supply tight, but pricing remains relatively conservative.

SemiAnalysis believes TSMC is fully capable of raising prices significantly, and not only would customers accept it, some would even welcome it—Nvidia is a typical case: if TSMC price increases mean competitors get fewer capacity allocations, Nvidia paying higher wafer prices actually helps consolidate its market position. Nvidia CEO Jensen Huang publicly stated in 2024 that TSMC should raise wafer prices, and the logic behind that is exactly this.

Nvidia's own pricing strategy also shows a similar conservative tendency. SemiAnalysis points out that Nvidia's pricing framework remains anchored to the previous assumption that "the price users are willing to pay per unit of computing power declines over time," but this assumption is no longer valid. With the explosion of agent workloads, demand for computing power is no longer growing linearly but is showing a compound acceleration trend.

Rubin System: Quantifying Nvidia's Pricing Headroom

Using the Vera Rubin (VR NVL72), set to be released in the second half of 2026, as a reference, SemiAnalysis has built a "One Chart to Rule Them All" pricing analysis framework, anchoring the floor and ceiling of rental pricing from both the cost and value sides.

Cost side (floor): Based on the deployment threshold that Neocloud projects require an internal rate of return (IRR) of at least 15.6%, the minimum rental price per GPU per hour for VR NVL72 needs to be about $4.92 to maintain Neocloud's willingness to deploy.

Value side (ceiling): Anchoring on the current 5-year contract rental price of about $0.70 per PFLOP for GB300, the corresponding rental ceiling for VR NVL72 is about $12.25 per GPU per hour.

Currently, the VR NVL72 system pricing only reduces the cost per PFLOP to about $0.28, a 60% drop compared to the GB300 NVL72, far exceeding the improvement range of the historical trend line. This means Nvidia's server price has about 40% room for increase, and even after the adjustment, there would still be sufficient profit margin for Neocloud, and the overall cost improvement would still be below the historical trend.

SOCAMM memory pricing is another key variable. The VR NVL72 uses socketed LPDDR5X memory modules (SOCAMM), which can be priced independently from the compute units. SemiAnalysis estimates that the SOCAMM contract price Nvidia pays in Q1 2026 is about $8 per GB, a significant jump from the previous quarter; it is expected that by the end of 2026, SOCAMM prices may exceed $13 per GB. Against this backdrop, it is logically reasonable for Nvidia to achieve a 60% gross margin on SOCAMM: on one hand, memory supply is constrained, and Nvidia holds the largest share advantage; on the other hand, the performance leadership of VR NVL72 at the TCO level leaves customers with no alternative options.

Value Destination: Who Wins, Who Waits

SemiAnalysis's framework reveals the core contradiction in current AI value distribution: The improvement in tokenomics is rapidly boosting profits for model providers, inference service providers, and Neocloud, but as the controllers of the most scarce supply-side resources, there is a clear mismatch between the pricing behavior of Nvidia and TSMC and their supply scarcity.

The persistence of this mismatch is essentially an active choice—Nvidia is playing a role similar to an "AI central bank," delivering value downstream through software efficiency improvements to sustain long-term ecosystem expansion while avoiding antitrust regulatory pressure. TSMC, meanwhile, continues its historical pricing philosophy of "stabilizing the ecosystem without fully capturing upside gains."

However, as inference ROI becomes increasingly clear and value-based pricing logic spreads across the market, the pressure on these two companies to shift toward value-based pricing will continue to rise. Once the shift occurs, the value distribution pattern of the AI industry chain will be reshaped once again—at that point, the bargaining power of the computing supply side will largely return to the hardware layer.

Risk Warnings and Disclaimers

        The market carries risks, and investment should be made with caution. This article does not constitute personal investment advice, nor does it consider the particular investment objectives, financial situations, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article are suitable for their specific circumstances. Investments made based on this article are at the user's own risk.
TOKEN-2.89%
VR-0.48%
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments