First, let's look at why Agentic AI's demand for CPUs is completely different from traditional LLMs. In the LLM era, workloads mapped onto large-scale parallel matrix operations, where GPUs were a natural fit, and the GPU:CPU ratio in CSP training clusters surged from 3:1 in 2020 to 8:1 in 2024. But the computing structure of Agentic AI is entirely different. Agent orchestration, tool calling, context window management, multi-agent coordination, and human-machine interaction loops—these are all general-purpose sequential processing tasks requiring low latency and high memory bandwidth, all jobs for CPUs. GPUs' role is reduced to token generation itself; everything else reverts to CPUs. Bernstein's data shows the GPU:CPU ratio will reverse to 2:1 by 2030, with CPUs' share of AI capex in inference clusters rising from 14% to 50%.

AMD itself gave a CPU TAM forecast of $60 billion at its Analyst Day in November 2025, which doubled to $120 billion six months later during its Q1 2026 earnings call. Intel is even able to sell chips that were previously written down as obsolete, with customers scrambling to buy them. The supply-demand tightness is evident.

China's x86 server CPU TAM is expected to grow from $7 billion in 2025 to $27 billion in 2030, but the growth rhythm has two phases. From 2025 to 2028, the CAGR is about 31%, lower than the global 35%, due to dual supply constraints. On one hand, AMD and Intel's capacity is preferentially absorbed by US hyperscalers, reducing their allocation to China. On the other hand, SMIC's advanced process capacity is limited, which both restricts the volume of domestic AI accelerator cards (inadequate AI accelerator supply directly suppresses CPU supporting demand) and limits Haiguang's own wafer allocation. After 2028, both constraints ease simultaneously. The mass release of advanced node capacity at SMIC and other domestic foundries, coupled with increased domestic AI chip supply driving overall server deployment acceleration, plus AIDC investment catch-up led by local governments, will push China's CPU market CAGR to 36%, surpassing the global rate.

The technological generational evolution of Haiguang is worth a close look. Starting with an AMD Zen1 license in 2016, it lost AMD's subsequent technical support after being added to the Entity List in 2019, but the Zen1 IP license itself is not subject to retroactive revocation. From Gen2 to G4, under the process constraint of 14nm (later switched to SMIC N+1/10nm equivalent), Haiguang achieved 30-50% performance improvement per generation through autonomous improvements in architecture, memory subsystem, and I/O. G4 already features 64 cores, DDR5, PCIe 5.0, with a SPEC2017 int-rate score of approximately 1000, comparable to Intel Xeon 4th/5th generation (2022-2023 products) and AMD Zen3 (2021 products), maintaining a technology gap of 2-3 years. An interesting side validation is the StackWarp hardware vulnerability disclosed this year affecting multiple generations of AMD Zen processors—Haiguang is completely unaffected because its security architecture has been replaced from AMD's SEV-SNP to self-developed CSV3. This indicates that the microarchitecture divergence is real, not a reskin.

G5 is the key to the entire growth logic. With 128 cores, 512 threads (SMT4), 16-channel DDR5, CXL 2.0, and a fully self-developed microarchitecture, it is expected to be manufactured on SMIC N+2 (7nm equivalent). Bernstein estimates its SPEC2017 int-rate score could exceed 2000, while Intel Granite Rapids (128 P-cores, launched September 2024) scores 2440 and AMD Zen5 (2024) scores 2089. If G5 launches as planned, Haiguang's gap to the latest global products would shrink from "two generations behind" to "one generation behind."

The feasibility of 128 cores does not depend on process breakthroughs but on chiplet packaging. Instead of a single monolithic 128-core die (which would have terrible yields), multiple smaller compute chiplets are stitched together—the same approach AMD EPYC has used since Zen2. Using smaller compute dies on SMIC's 7nm equivalent node, yields are controllable, and total core count depends on how many dies are packaged. The +17% IPC improvement comes from the accumulation of four generations of full-process design experience from Gen2 to G4, plus the maturity of domestic EDA tools for 7nm digital logic synthesis. The 16-channel DDR5 depends on Changxin Memory's DDR5 capacity expansion and Montage Technology's CXL controller IP. Among these three conditions, SMIC's 7nm capacity allocation is the hardest bottleneck.

The truly differentiating feature of Haiguang G5 is SMT4. With 512 threads per socket, it is the highest level among all current server CPUs. AMD and Intel mainstream products use SMT2. In what scenarios does SMT4 have an advantage? Multi-tenant inference services (logical CPU density doubles, unit inference request cost decreases), Agentic AI orchestration layers (large number of I/O-bound and memory-bound lightweight threads), and memory-constrained LLM inference (when threads wait for KV-cache data, other threads can utilize execution units). These are precisely the largest incremental scenarios in China's future market. China has structural advantages on the inference side—lower electricity costs plus LLM inference optimization capabilities demonstrated by companies like DeepSeek—meaning higher token output for the same compute power. If China's data centers' CPU:GPU ratio ends up higher than the US, the upside for CPU demand is even greater than Bernstein's base case.

Bernstein's market share forecast for Haiguang is 19% by value in 2025 rising to 36% in 2030, and by shipment volume nearly 50% (due to lower ASP vs Intel/AMD). The logic has two steps. The first step is the continued digestion of existing domestic substitution demand—this is policy-driven and does not rely on absolute performance. The second step is the commencement of commercial procurement by CSPs after 2027, which is the main source of incremental growth. CSP customers will account for about 75% of China's x86 server deployments by 2030. For Haiguang to go from its current mid-single-digit penetration to 20% in this market, G5's performance must be sufficient, while AMD/Intel supply remains tight. Haiguang also has a unique lever: full-stack bundling of CPU+DCU (AI accelerator card). During periods of tight domestic AI chip supply, Haiguang can use DCU allocation rights to secure CPU purchase volumes, and vice versa. No other company besides Huawei can do this.

The viability of the entire logic depends on the supply side on SMIC's advanced node capacity ramp-up to sustain G5 mass production, and on the demand side on CSP customers' actual testing and acceptance. G5 was confirmed to have entered mass production in June, with supporting servers ranging from air-cooled dual-socket to immersion liquid cooling (single group 80,000+ cores), but so far no independent SPEC benchmark data has been published. The 2000-point figure is Bernstein's estimate, not a measured result. CSPs require a POC verification cycle for large-scale procurement—from sample delivery to qualification certification to batch deployment, typically two to three quarters. We'll see if any top CSPs place centralized procurement orders in the second half of the year.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
gStocksTokenizedStocksLive
4.75M Popularity
#
WeakNFPShakesRateHikeOdds
1.04M Popularity
#
PredictWorldCup🇧🇷vs🇳🇴
200.55K Popularity
#
ETHBreaks1700
152.56M Popularity
#
MetaSellsComputeTriggersChipSlump
1.39M Popularity

Pinned

Sitemap

Trending Topics

gStocksTokenizedStocksLive

WeakNFPShakesRateHikeOdds

PredictWorldCup🇧🇷vs🇳🇴

ETHBreaks1700

MetaSellsComputeTriggersChipSlump

Pinned