HBM vs DRAM: Why Can't AI Large Models Do Without It? Memory Chips Move from the "Planar Era" to the "3D Revolution"

On June 30, 2026, Bitcoin is oscillating narrowly around $60k, while Ethereum remains in the $1,600 range. After the continuous correction since June, the short-term bearish dominance in the crypto market remains unchanged. But just as crypto assets enter "junk time," another track is experiencing unprecedented explosive growth—semiconductor memory.

The World Semiconductor Trade Statistics (WSTS) spring 2026 report significantly raised its industry growth forecast: the global semiconductor market size could exceed $1.51 trillion in 2026, a year-on-year increase of 90%, with memory chips growing 250% year-on-year, surpassing $800 billion. Memory output will surpass wafer foundry for the first time, becoming the primary growth driver in semiconductors.

And the absolute protagonist of this memory revolution is HBM (High Bandwidth Memory). In 2026, the HBM market is expected to grow 58% to $54.6 billion, accounting for nearly 40% of the DRAM market. What exactly is the difference between HBM and DRAM? Why are AI large language models so dependent on HBM?

HBM vs DRAM: Same Origin, Different Destinies

HBM and DRAM share the same basic storage medium—Dynamic Random Access Memory. However, their technical routes, architecture design, and application scenarios have diverged in completely different directions.

Traditional DRAM follows a "planar expansion" route. Traditional DRAM, represented by DDR4 and DDR5, uses a planar architecture, improving performance through process node upgrades (e.g., from 20nm to 2nm) and architecture optimizations (e.g., increased prefetch bits in DDR5). The core logic is to continuously shrink transistor size and increase frequency on a two-dimensional plane. But this path is approaching physical limits—process nodes below 2nm face challenges like quantum tunneling, and relying solely on process shrinkage can no longer meet the exponential demand for memory bandwidth from AI computing.

HBM, on the other hand, chooses the "vertical stacking" path to break through. HBM uses a 3D structure, employing Through-Silicon Via (TSV) technology to vertically stack multiple DRAM dies, forming a cube structure—thousands of tiny holes are drilled in the DRAM chip, with vertical through-electrodes connecting the upper and lower chips; the bottom layer is the DRAM logic control unit, responsible for overall timing and control. This "stacking" design allows HBM to achieve extremely high bandwidth density within a very small physical footprint.

The gap in key performance indicators between the two is generational:

Bandwidth: Traditional DRAM (e.g., DDR5) has a bandwidth of about 50-100 GB/s, while HBM3E can achieve 1.2 TB/s per stack, and the next-generation HBM4 is expected to reach over 2.0 TB/s. HBM's bandwidth is more than 10 times that of traditional DRAM.

Power Efficiency: HBM can be as low as 5 pJ/bit or less, while traditional DRAM is 10-15 pJ/bit. In data centers with thousands of GPUs running simultaneously, this difference translates to tens of millions of dollars in annual electricity costs.

Latency: Traditional DRAM, leveraging its planar architecture advantage, can maintain latencies at the 10 ns level, while HBM, due to increased stacking layers, has latencies at the 100 ns level. However, AI training and inference scenarios are much more sensitive to "throughput" than "single-access latency"—high-speed throughput of massive parameters is far more important than the speed of a single access.

Cost: HBM production costs are much higher than traditional DRAM. Although the cost per Gb of HBM4 is 30% lower than HBM3, it is still 3-5 times that of DDR5 with the same capacity. HBM consumes about 4-5 times more wafer area than DDR5. The TSV process makes HBM's bit density significantly lower than that of DDR at the same specification—SK hynix's D1z DDR4 has a bit density of 0.296 Gb/mm², 85% higher than its HBM3 (0.16 Gb/mm²). The additional area required for TSV and the complex stacking and packaging process are the core reasons for HBM's high cost.

In short: Traditional DRAM pursues "cheap and sufficient," while HBM pursues "extreme bandwidth"—this is a battle between "cost-first" and "bandwidth-first" technology routes.

The Memory Wall Crisis: Why Are AI Large Language Models Absolutely Dependent on HBM?

The dependence of AI large language models on HBM stems from a fundamental bottleneck known in the industry as the "Memory Wall."

Over the past 20 years, GPU computing power has increased 60k times, while DRAM bandwidth has only increased 100 times. Computing power growth far exceeds data supply speed—like a racing car with massively increased horsepower but fuel lines still from 20 years ago. The GPU is the engine; HBM is the fuel injection system. If the fuel supply speed cannot keep up, no matter how powerful the engine, it can only idle.

The operating mechanism of large language models amplifies this contradiction. AI models do not simply retrieve static information to generate responses; they continuously maintain a "working state" that includes context windows, key-value cache (KV Cache), intermediate activations, and routing decisions. This data must be accessed in real-time with ultra-low latency and always be available. During the processing of a complete token sequence, the model must continuously access and update the context—even a slight increase in memory latency can lead to reduced throughput, response delays, or even force operators to add more hardware.

Training Phase: Large models with trillions of parameters need to iterate repeatedly over massive amounts of data. Each forward and backward propagation involves reading and updating huge volumes of parameters. The TB/s-level bandwidth provided by HBM is the decisive factor in reducing training time.

Inference Phase: With the accelerated development of multimodal large models and AI agents, the volume of token calls is rising rapidly. The bottleneck for inference applications is often not "how fast can it compute," but "how fast can data be fed." The end of bandwidth is HBM.

At the system level, AI runs on a hierarchical memory architecture: HBM supplies data to accelerators, DRAM stores real-time states and conversation memory, and NAND-based SSDs provide persistent storage for datasets, embeddings, retrieval indexes, logs, and checkpoints. HBM sits closest to the computing core, handling the highest-frequency and most urgent data supply tasks—a role that no other storage medium can replace.

Because of this, all leading AI accelerators used for generative AI training and inference use HBM. HBM is not an "optional accessory" for AI; it is the "oxygen tank" that determines how fast AI can go.

Supply-Demand Imbalance: A Multi-Year Structural Shortage

Demand for HBM is rigid, while supply is "locked in."

Demand Side: In 2026, global AI infrastructure spending will reach $450 billion, with inference computing accounting for over 70% for the first time, driving strong demand for GPUs, HBM, and high-speed network chips. HBM demand growth in 2026 is mainly driven by AI ASIC capacity upgrades, with HBM capacity per AI chip significantly increasing from 96 GB/192 GB to 216 GB/288 GB. Although Nvidia's Rubin platform keeps HBM capacity per GPU flat with the previous generation, higher shipment volumes continue to push overall demand higher. The combined capital expenditure of the world's nine largest cloud service providers in 2026 is expected to reach approximately $830 billion, up 79% year-on-year.

Supply Side: Although the three major manufacturers—Samsung, SK hynix, and Micron—have allocated 70% of new/adjustable capacity to HBM, the HBM capacity gap remains as high as 50% to 60%. As of the first quarter of 2026, all HBM capacity from the three major manufacturers has been sold out. According to SemiAnalysis data, DRAM supply in 2026 will fall short of demand by about 7%, with an HBM gap of 6%, expanding to 9% in 2027.

More critically, supply is rigid. Even if the three major manufacturers decide to expand capacity now, constrained by TSV processes, advanced packaging yields, equipment delivery cycles, and other physical limitations, new capacity cannot be released until at least 2028-2029. International investment banks generally believe that the structural shortage of HBM supply will last at least until 2028. Nvidia CEO Jensen Huang explicitly stated that the global HBM supply shortage "is not a short-term market fluctuation, but a structural industry dilemma that will last for years."

Pricing Side: Samsung Electronics and SK hynix have raised the supply price of HBM3E in 2026 by nearly 20%. The initial contract price for HBM4 12-layer is expected to be more than 10% higher than that of HBM3E 12-layer in 2025.

Market Landscape: Who Is Leading This Memory Revolution?

The HBM market is highly concentrated. Institutional forecasts indicate that SK hynix will have about a 52% shipment market share in 2026, Samsung Electronics about 39%, Micron about 8%, and Chinese manufacturers maintaining a very low share. In terms of sales, SK hynix's HBM revenue in 2026 is expected to reach $5.95 billion, firmly ranking first globally.

In the first quarter of 2026, SK hynix's market share in the global HBM market was about 51.4%. TrendForce expects SK hynix to maintain about a 50% HBM market share for the full year 2026; Counterpoint even forecasts its share in the HBM4 market will reach 54%.

Gross margins for the three major manufacturers have exceeded 70% or even 80%. HBM profit distribution follows a "pyramid" structure—the closer to the core technology and bottleneck links, the higher the allocation ratio.

Meanwhile, an interesting phenomenon is occurring: the profitability of general-purpose DRAM is structurally overtaking HBM. As of the first quarter of 2026, the gap in operating profit margins between general-purpose DRAM and HBM has widened to more than 15 percentage points. Market calculations show that allocating capacity to general DRAM in 2026 yields more than double the revenue per wafer compared to HBM, with nearly triple the gross profit. This is precisely why SK hynix is considering reallocating some resources back to general DRAM—but this actually confirms that the entire memory market is in a comprehensive boom.

Investment Perspective: Opportunities in the HBM Super Cycle

The structural shortage and upward price trend of HBM provide investors with clear industrial logic support.

Memory Manufacturers are direct beneficiaries. SK hynix (Korea), Samsung Electronics (Korea), and Micron (US) capture the vast majority of excess profit in the supply chain due to technological monopoly and capacity scarcity. Based on a forecast of DRAM average selling prices rising 62% by 2026, Morgan Stanley raised profit forecasts for memory manufacturers by 56% to 63%.

Upstream Supply Chain also benefits. Large-scale capacity expansion by memory leaders directly drives demand for semiconductor equipment such as etching, thin-film deposition, and testing. The industry chain's prosperity is being transmitted from upstream to midstream. HBM's advanced packaging requirements have also driven the industrialization of 2.5D packaging technologies like CoWoS.

AI Chip Manufacturers are the final demand side for HBM. Leading AI chip companies like Nvidia (US) and Broadcom (US) continue to expand their HBM procurement. Nvidia's Rubin Ultra will increase HBM capacity per GPU to 1 TB.

Gate Stock Trading: One-Stop Participation in Global Memory and AI Investment

For investors looking to participate in this memory super cycle, Gate Stock provides a convenient entry channel.

Currently, Gate Stock offers a 7×24 trading service system covering the three core markets of US stocks, Hong Kong stocks, and Korean stocks, supporting over 10,000 US stocks and ETFs, more than 1,500 Hong Kong stocks, and over 1,000 Korean stocks, collectively covering more than 12,500 global stocks and ETF assets. The offerings include representative global listed companies such as Apple, Nvidia, Microsoft, Tencent Holdings, Xiaomi Corporation, Samsung Electronics, and SK hynix.

Users can participate in global stock investment using USDT through a unified Gate account, with support for fractional share trading starting from as low as 0.01 shares and entitlement to dividend distributions. The platform also supports corporate actions such as stock splits and reverse splits, with full coverage on both App and Web.

Gate Stock, based on its existing pre-market, regular market, and after-hours trading, further supports overnight and weekend trading, breaking through the time limitations of traditional securities markets. Inter-broker transfer services are also coming soon, further enhancing the flexibility and convenience of user stock asset management.

Trading Method: After depositing funds into their unified Gate account, users can select target stocks in the stock trading module and trade using USDT pricing. The platform provides real-time quotes, technical analysis tools, and order type selection (market orders, limit orders, etc.), with an operation process consistent with the crypto asset trading experience.

Conclusion

The difference between HBM and DRAM is essentially a divergence between two technical routes: "bandwidth-first" versus "cost-first." Against the backdrop of continuous AI computing expansion, HBM has broken through the "memory wall" with its 3D stacking and TSV technology, becoming an irreplaceable core component for large model training and inference.

In 2026, the global semiconductor market size will exceed $1.51 trillion, memory chip growth will be 250%, and the HBM market will grow 58% to $54.6 billion. The capacity gap is as high as 50% to 60%, and all capacity from the three major manufacturers is sold out. This is not an ordinary cyclical fluctuation but a structural transformation driven by long-term capital expenditure on AI infrastructure.

For investors, the three major chains—memory manufacturers, equipment/materials, and AI chips—all have clear industrial logic support. And Gate Stock's 7×24 trading service for US, Hong Kong, and Korean stocks provides flexible and efficient tools for global investors to participate in this memory super cycle. At a time when market sentiment is extremely fearful (fear index 14-16), the divergence between industry fundamentals and market sentiment often breeds the most noteworthy structural opportunities.

FAQ

Q1: What is the core difference between HBM and DRAM?

The core difference between HBM and traditional DRAM lies in architecture. Traditional DRAM uses a planar architecture, improving performance through process upgrades; HBM uses 3D stacking technology, employing TSV (Through-Silicon Via) to vertically stack multiple DRAM dies, achieving an ultra-wide data path. HBM3E bandwidth can reach 1.2 TB/s, more than 10 times that of DDR5, but its cost is also 3-5 times that of DDR5 with the same capacity.

Q2: Why must AI large language models use HBM?

Training and inference of large models require high-speed reading and writing of massive parameters. Traditional DRAM bandwidth growth lags far behind computing power improvement (computing power increased 60,000 times in 20 years, bandwidth only 100 times), creating a "memory wall" bottleneck. HBM, with its TB/s-level bandwidth, can continuously supply data to the GPU, preventing computing power from idling. All leading AI accelerators use HBM.

Q3: Who are the main players in the HBM market?

The HBM market is highly concentrated. SK hynix will have about a 52% shipment market share in 2026, Samsung about 39%, and Micron about 8%. SK hynix ranks first in sales, with HBM revenue expected to reach $5.95 billion in 2026. All HBM capacity from the three major manufacturers for 2026 is sold out, with some customers having locked capacity until 2028.

Q4: How long will the HBM supply shortage last?

International investment banks generally believe the HBM supply shortage will last at least until 2028. Demand is driven by AI infrastructure capital expenditure, while supply is constrained by physical limitations such as TSV process, packaging yields, and equipment delivery cycles. Even if expansion starts now, new capacity release will take at least until 2028-2029. Huang Renxun calls it a "multi-year structural industry dilemma."

Q5: How can I invest in HBM-related stocks on the Gate platform?

Gate Stock supports 7×24 trading of US, Hong Kong, and Korean stocks, covering over 12,500 stocks and ETFs. Users can invest using USDT through a unified account, with a minimum of 0.01 shares. HBM-related targets include memory manufacturers SK hynix (Korea), Samsung Electronics (Korea), Micron (US), and AI chip maker Nvidia (US).

BTC-1.00%
ETH0.46%
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned