Has the era of AI reasoning truly arrived? The tri-polar computational power restructuring of GPU, CPU, and ASIC

Question

On June 22, 2026, the U.S. stock chip sector surged across the board— the Philadelphia Semiconductor Index rose 6.42% in a single day, Intel gained over 10% due to news of collaborating with Apple to produce chips, TSMC ADR increased 6.94% to close at $462.12, and Nvidia rose nearly 3%. Behind market sentiment is an industry judgment that is accelerating realization: the demand structure for AI computing power has shifted from training-driven to inference-driven.

According to industry analysis, inference now accounts for about one-third of total AI compute demand in 2023, rising to two-thirds by 2026, and is expected to reach 70% to 85% between 2028 and 2030. This structural shift is redefining the main battleground for chip competition—from “whose GPU trains the fastest” to “whose inference chip has the lowest total cost and highest throughput.”

The global AI inference chip market is valued at $85.4 billion in 2024, projected to grow from $105.47 billion in 2025 to $570.77 billion by 2033, with a compound annual growth rate (CAGR) of 23.5% during the forecast period. Among them, the cloud AI inference chip market was valued at $102.19 billion in 2025, expected to grow to $118.9 billion in 2026, and reach $320.98 billion by 2032. Meanwhile, the global edge AI chip set (including inference and training) market size is expected to increase from $34.4 billion in 2026 to $96 billion in 2031.

During this expansion cycle, the relative strength among different chip types is undergoing subtle yet profound changes. GPUs remain the largest market player, supported by both training and inference demands, and are expected to maintain a 20% CAGR through 2031. However, AI ASICs are viewed by many institutions as the fastest-growing segment. JPMorgan analysts estimate that the digital AI ASIC market will reach about $60-70 billion by 2026, maintaining a CAGR of over 40% to 50% in the coming years.

More notably, the return of CPUs is drawing attention. Over the past three years, CPUs have been on the periphery of AI narratives, but the explosion in inference demand is changing this landscape.

Why CPUs Are Returning to Center Stage

There are fundamental differences in computational logic between AI inference and training. Training is a large-scale parallel matrix operation—trillions of floating-point calculations can be performed simultaneously across tens of thousands of GPU cores, which is where GPUs have an absolute advantage. But inference, especially for agentic AI, involves task orchestration, tool invocation, multi-step logical reasoning, and sequential decision-making. These workloads are not purely parallel computations; they rely heavily on the CPU’s strengths in complex logical control and serial processing.

A study by Georgia Tech and Intel indicates that in agentic AI scenarios, 50% to 90% of latency comes from CPUs rather than compute chips—because large models need to call plugins, perform online searches, and handle multi-step logic, all scheduled by CPUs. Nvidia itself acknowledged this reality in March 2026: its executive Dion Harris publicly stated, “CPUs are becoming the bottleneck in AI workflows”—a surprising admission from a company that traditionally believed “GPUs are the only chips needed for AI.”

This trend can be more intuitively seen in the configuration ratios. During AI training, the CPU-to-GPU ratio is typically extremely skewed at 1:8, with GPUs bearing most of the computational load. But in the inference era, according to TrendForce reports, this ratio is rapidly approaching between 1:1 and 1:2. Intel CEO Pat Gelsinger also pointed out in Q1 2026 earnings call that training workloads usually require 7-8 GPUs per CPU, while inference workloads have tightened to 3-4 GPUs per CPU, with further movement toward a 1:1 balance expected.

Using Nvidia CEO Jensen Huang’s estimates as a reference: each GW of data center capacity requires about 300k Rubin GPUs, while based on each ARM CPU with 136 cores, roughly 221k CPUs are needed per GW. The new CPU-to-GPU ratio is approximately 1:1.4. Compared to the GPU-dominant era of the past, the CPU’s position has significantly improved.

The GPU’s Moat and Challenges in Inference Scenarios

Although CPUs are regaining ground, GPUs still occupy an irreplaceable position in AI inference, primarily due to memory bandwidth and parallel throughput.

In LLM inference, generating each token requires reading hundreds of millions to billions of parameters, making it a typical memory-intensive task. CPU solutions rely on system DDR memory, with bandwidth usually in the 50-100 GB/s range; GPUs use GDDR6X or HBM memory, with bandwidth exceeding 800 GB/s, and high-end GPUs’ HBM2e memory bandwidth can reach 1.5 TB/s—20 times that of CPUs. In Llama 3.1 8B model inference, CPU solutions achieve only 819 tokens/sec, while an 8-GPU cluster can reach 46,841 tokens/sec. When concurrent requests increase, CPU performance drops sharply from 819 tokens/sec to 257 tokens/sec, whereas the 8-GPU cluster shows almost no degradation.

In terms of compute density, GPUs achieve parallelism through thousands of CUDA cores, supporting low-precision formats like FP4/FP8, with compute power reaching hundreds of TFLOPS, while CPUs’ FP32 performance is typically in the 1-10 TFLOPS range.

These data points demonstrate that in high-throughput, high-concurrency inference scenarios—such as cloud AI services serving large-scale users—GPUs remain the optimal solution. Nvidia’s dominant position in this field remains unchallenged. According to SemiAnalysis, in Q1 2026, Nvidia held a 92% market share in AI training chips and 78% in inference chips. IDC estimates Nvidia controls about 81% of the AI chip market. The AI accelerator market was about $160 billion in 2025 and is heading toward over $200 billion in 2026, with inference spending expected to account for two-thirds.

However, the share of GPUs in inference is facing multiple pressures—return of CPUs, specialized ASIC competition, and cost structure realities.

The CPU Manufacturers’ Inference Counterattack

The re-evaluation of CPUs’ value in inference has already translated into measurable market momentum.

The data center processor market is rapidly expanding driven by surging demand for generative AI workloads, expected to grow from $215 billion in 2025 to $656 billion by 2031. Guohai Securities notes that ultra-large data centers are entering an “upgrade cycle,” with server CPU shipments expected to increase by 25% in 2026.

AMD is a significant beneficiary of this trend. AI server demand has driven EPYC CPU shipments, with the fifth-generation Turin already capturing a large share of the server CPU market. By 2026, server CPU business is expected to grow at least 50%. Bernstein analysts forecast that AMD’s flagship EPYC processors could see a 30% sales surge in 2026. As of early 2026, Intel holds about 60% of the data center CPU market, AMD about 24%, and Nvidia about 6%. AMD also competes with Nvidia in AI GPU markets using Instinct accelerators, positioning itself uniquely in the inference era’s dual deployment.

Intel is also actively adjusting its strategy. At Computex 2026, new CEO Pat Gelsinger announced the adoption of 18A process and rack-scale decoupled architecture: inference era CPUs are returning to the main stage, shifting AI infrastructure from “buying the whole stack” to “building with Lego.” The Xeon processors’ built-in Advanced Matrix Extensions (AMX) can accelerate inference for medium- and small-scale large language models without additional GPUs or AI accelerators.

The most symbolic change comes from Nvidia itself. This GPU-defining AI company has launched the Grace and Vera CPU product lines in 2026, with Vera specifically designed for inference and agentic AI workloads. Nvidia expects its CPU revenue to reach $20 billion in 2026. The company also announced independent CPU products with Arm in 2026, marking its official entry into the CPU race.

The Rise of ASICs and Dedicated Chips: A Third Path

Beyond the binary narrative of GPUs and CPUs, ASICs (Application-Specific Integrated Circuits) are becoming the fastest-growing variable in the inference market.

TD Cowen forecasts that the share of commercial accelerators will decline from about 91% in 2025 to around 75% in 2030, while custom ASICs will rise from about 9% to approximately 25%. ASIC server shipments are expected to grow 44.6% in 2026, compared to 16.1% for GPU servers, only a third of ASIC growth.

Large cloud providers are accelerating in-house development of inference chips. ASICs optimized for inference such as Google TPU, AWS Inferentia, Meta MTIA, and Groq’s LPU are emerging rapidly. Broadcom’s AI revenue in Q2 2026 reached $10.8 billion, up 143% year-over-year, with full-year AI revenue guidance at $56 billion, up 180%. Broadcom aims to capture about 60% of the custom AI chip market.

This trend indicates that the inference chip market is shifting from “general-purpose GPU dominance” to a “GPU + CPU + ASIC” diversified landscape. GPUs handle intensive training and large-scale inference, CPUs manage task orchestration and system control, while ASICs deliver extreme energy efficiency for specific inference workloads.

Cost Structure and the Reshaping of Inference Economics

The ultimate question in chip selection for inference is: what is the cost per million tokens?

During training, model accuracy and training time are the primary metrics, with higher cost tolerance. But inference is a continuous, high-frequency production activity—each API call, each user request incurs direct costs. This shifts the competitive focus of inference chips from “absolute performance” to “effective throughput per unit cost.”

GPU solutions tend to have higher hardware costs. For example, AMD’s MI300X costs around $10k to $15k, while Nvidia’s H100 ranges from $25k to $40k. But the unit compute cost is lower—one cloud provider’s on-demand instance shows that GPU instances generate tokens at 40-60% lower cost per second than CPU instances. CPU solutions have the advantage of no additional hardware investment, suitable for low concurrency, low-latency single-task scenarios.

However, as inference scale increases, the marginal costs of CPU solutions rise faster—when concurrency grows, CPUs must schedule tasks via time-slicing, with context switching overhead increasing exponentially with concurrency. This means that in large-scale inference deployments, the high initial investment in GPUs or ASICs can be offset by higher throughput and lower unit costs, leading to better long-term ROI.

Conclusion

The share of inference compute demand rising from one-third to two-thirds reflects a deep transformation in the competitive logic of the chip industry.

For Nvidia, its dominant position in training (about 90% share) is unlikely to be challenged in the short term, but the incremental competition in inference will intensify. New Street Research’s most aggressive forecast predicts Nvidia’s inference share could drop to 20-30% by 2028. Even conservative estimates like Bloomberg Intelligence’s—expecting Nvidia to maintain 70-75% share by 2030—are supported by the fact that ASIC shipments are growing much faster than GPUs.

For AMD and Intel, the resurgence of CPU demand in inference presents a structural opportunity. AMD’s dual deployment of EPYC CPUs and Instinct GPUs, and Intel’s ongoing iterations of 18A process and Xeon processors, are attempts to seize this window.

For cloud providers and AI application developers, the increasing chip options mean more refined cost optimization. From general-purpose GPUs to custom ASICs, from CPU inference to GPU acceleration, hardware choices will increasingly depend on workload characteristics—model size, latency requirements, concurrency, and budget.

AI inference’s compute demand is growing faster than training’s. This shift from training to inference is reshaping the entire industry chain—from chip design to data center architecture. GPUs will not lose their position, but they are no longer the only answer.

View Original

Has the era of AI reasoning truly arrived? The tri-polar computational power restructuring of GPU, CPU, and ASIC

Why CPUs Are Returning to Center Stage

The GPU’s Moat and Challenges in Inference Scenarios

The CPU Manufacturers’ Inference Counterattack

The Rise of ASICs and Dedicated Chips: A Third Path

Cost Structure and the Reshaping of Inference Economics

Conclusion

Trending Topics

MyGateTradeStory

PredictWorldCup🇫🇷vs🇮🇶

TradFiCFDGoldMasters

GateProofOfReservesReport

TrumpMemeCoinRises7.9%

Pinned