As a former benchmark enterprise in Japan’s semiconductor industry, Elpida integrated core technologies from NEC, Hitachi, and Mitsubishi, the three giants. Even with government funding and backing, it was still difficult to reverse the decline. After the company accumulated a massive debt of 430 billion yen and filed for bankruptcy protection, it was ultimately acquired by Micron Technology of the United States for 200 billion yen. Following integration and digestion, it completely exited the industry stage.

Looking back at the industry’s development history, Intel, Texas Instruments, and Motorola all once entered the DRAM race, but subsequently withdrew from the market one after another. Japan’s entire semiconductor memory industry, from prosperity to collapse, took less than twenty years. Later, Korean companies took over and rose, with Samsung and SK Hynix relying on government subsidies and aggressive price wars to sweep the global market, squeezing out all competitors’ survival space.

Micron Technology became the final survivor and is currently the only company in the United States capable of mass production of advanced storage chips. Headquartered in Boise, Idaho, this company has long been overshadowed by Nvidia and TSMC’s industry halo. It does not participate in GPU design nor layout logic chip manufacturing. However, as AI computing power demand explodes, a physical bottleneck that has been dormant for decades becomes increasingly apparent: the time spent waiting for data transfer in the compute units has already exceeded the time spent on computation itself. This industry pain point has no software optimization path; it can only be addressed through breakthroughs in hardware technology, which is precisely Micron’s core focus after forty years of deep cultivation.

The core bottleneck of AI computing: the memory wall as a common industry challenge

Under the von Neumann architecture, GPU and TPU compute units are physically independent from main memory. The compute units have small-capacity SRAM onboard as cache, while large model weights and input data are mainly stored in external DRAM. Data must be transferred across regions via intermediary signals in the form of electrical signals.

For example, a large language model with 70 billion parameters requires about 140GB of physical memory for weights at FP16 precision. Currently, mainstream high-end AI computing cards have 80GB to 192GB of VRAM, so large models can only be split across multiple cards for collaborative operation. Over the past decade, chip computing power has increased exponentially, but memory bandwidth has lagged far behind due to physical pin count, signal frequency, and heat dissipation limits. When computing performance exceeds memory supply capacity, compute units fall into idle waiting, greatly reducing hardware utilization.

AI is divided into two core scenarios: training and inference, with significant differences in underlying logic. During training, large-scale parallel processing is dominant. The same data is repeatedly called from cache, resulting in high arithmetic intensity. The bottleneck is focused on computation speed rather than memory, making it a typical compute-intensive scenario, where Nvidia’s computing advantage is fully unleashed.

In inference, the logic is entirely different. Large language models generate text token by token using an autoregressive mechanism. To avoid recomputing historical attention scores, systems need to build a KV cache mechanism in VRAM. For example, with a context length of around 4096 tokens, a single user request consumes about 1.34GB of VRAM; after subtracting model weights, only about 20GB remains for KV cache on two A100 cards, supporting at most 14 concurrent requests. The arithmetic intensity during inference is extremely low, and performance is entirely constrained by memory bandwidth. It is a memory access-intensive task, where the physical transfer rate of HBM directly determines the upper limit of throughput.

From an energy consumption perspective, reading each bit from external HBM consumes about 10–20 pJ/bit, while a single FP16 floating-point operation consumes only 0.1 pJ. Data movement energy is 100 to 200 times higher than computation energy. In large-scale inference scenarios, if memory access patterns cannot be optimized, data centers will spend a large amount of electricity on bus data transfer rather than actual logical computation. This is also a core driving force behind Micron’s continued deep investment in HBM technology.

Micron’s fundamentals and positioning in the AI supply chain

Micron is a typical IDM integrated component manufacturer, achieving full self-research and production of chip design, wafer manufacturing, packaging, and testing. Its wafer factories focus on the memory chip sector and do not involve CPU or GPU manufacturing, concentrating on memory and flash storage products.

In terms of revenue structure, DRAM accounts for over 70%, NAND flash about 20–30%, and NOR flash is relatively small. DRAM is the core carrier of general memory modules, NAND is the key medium for solid-state drives, and NOR flash is mainly used in automotive electronics and industrial equipment, responsible for fast boot code execution. Although a niche market, it has irreplaceable significance.

Business layout is divided into four major segments: data center and server compute network, mobile terminal for smartphones, enterprise storage SSDs, and embedded applications in automotive and industrial fields.

In the AI supply chain, Nvidia designs GPUs, and TSMC handles wafer foundry. Micron, although not involved in these two links, is an indispensable core component supplier for AI acceleration cards. Relying solely on GPU logic chips cannot support large model operation; the performance bottleneck in inference is memory bandwidth. Therefore, Nvidia’s GPUs need to be tightly integrated with high-bandwidth HBM memory. Micron, along with SK Hynix and Samsung, are key HBM suppliers. Their products are integrated with GPUs via TSMC’s CoWoS advanced packaging to form complete AI computing modules. GPUs are the AI computing brain, and HBM is the high-speed data transmission channel—both are indispensable.

In terms of competitive logic, Nvidia relies on architecture and ecosystem to build a moat, while Micron continuously iterates through process technology and stacking packaging to establish industry barriers. Each generation of HBM bandwidth upgrades depends on more precise TSV silicon via and higher stacking layers, with extremely high technical entry barriers.

DRAM: The underlying infrastructure behind AI computing power

In traditional computer architecture, DRAM as main memory perfectly addresses the speed gap between large-capacity, low-speed hard drives and high-speed, small-cache CPUs. During program operation, data is loaded from the hard drive into DRAM, and the CPU completes data read/write with nanosecond latency and ultra-high bandwidth. Kernel and background processes reside in DRAM in real-time. DRAM has the characteristic of data loss upon power failure; internal capacitors experience natural leakage, requiring continuous refresh to maintain data storage. The basic unit consists of one transistor and one capacitor.

With the advent of AI, the application form and demand logic of DRAM have been fundamentally reconstructed. The compute core shifts from CPU to GPU. DRAM is no longer limited to motherboard DDR memory modules but appears as high-bandwidth memory (HBM), vertically stacked using TSV silicon via technology, integrated with the GPU in a silicon interposer.

Currently, the core value of DRAM focuses on two dimensions: first, loading large model weights—e.g., a 70-billion-parameter model in FP16 format requires 140GB of storage, which must be fully loaded into HBM before inference; second, KV cache dynamic occupation—generating text with large models requires caching historical context. The larger the context length, the higher the VRAM consumption, limiting the concurrent capacity of a single high-end server. Training scenarios consume even more VRAM, as they need to store multiple intermediate results, optimizer states, and additional data, reaching three to four times the VRAM demand of inference.

Constrained by the memory wall, GPU computing power has outpaced memory bandwidth growth, causing GPUs to frequently idle during inference. Upgrading HBM bandwidth directly determines the throughput limit of AI inference servers, which is the underlying logic for Micron’s increased investment in HBM R&D.

Storage industry’s top three: Samsung, SK Hynix, Micron—differentiated competition

The global DRAM market is dominated by Samsung, SK Hynix, and Micron, accounting for about 95% of the market share, with each having distinct core advantages.

In process node iteration, Micron leads the industry, from 1-alpha, 1-beta to 1-gamma nodes, always achieving the mass production of new high-density DRAM first. This results in higher chip output per wafer and lower manufacturing cost per bit, giving it a gross margin advantage. Samsung’s process below 14nm faces yield bottlenecks, slowing its iteration pace; SK Hynix’s process progress is on par with Micron.

The HBM market landscape is quite different. SK Hynix remains the industry leader, holding over 50% market share, and is the exclusive supplier for Nvidia’s high-end GPUs at launch. It leverages MR-MUF packaging technology, building advantages in multi-layer stacking heat dissipation and yield control. Micron, as a latecomer, skipped HBM3 and directly tackled HBM3E, relying on energy efficiency advantages to enter Nvidia’s supply chain, using TC-NCF packaging technology. Its manufacturing difficulty is higher, and capacity and market share are less than SK Hynix. Samsung, during HBM3 and HBM3E phases, failed Nvidia’s testing due to heat and power issues, missing the AI memory boom window. Currently, Samsung is betting on HBM4 to seek a breakthrough.

Energy efficiency is a key differentiator for Micron: under the same bandwidth, Micron’s HBM consumes 20–30% less power than competitors. Although the difference seems small per card, deploying tens of thousands of cards in data centers can significantly reduce electricity and cooling costs. Additionally, its 1-gamma process LPDDR5X reaches 9.6Gbps, with a 30% power reduction, perfectly fitting mobile AI models’ power consumption needs.

In capacity scale, Samsung maintains the top position with its large volume, relying on price wars to control the market; Micron’s capacity is the smallest, avoiding commoditized price competition and focusing on technological premium routes, leveraging process and energy efficiency to secure market position.

Besides DRAM and HBM, Micron’s second growth curve includes NAND and NOR flash. NAND market share remains around 10–15%, ranking 4th or 5th globally. NOR flash has abandoned low-end consumer markets, focusing on high-end automotive and industrial fields, leading the Octal xSPI high-speed interface standard, with products certified to ASIL-D safety standards. It provides long-term supply from its own wafer fabs for over ten years, binding key automotive and industrial customers, avoiding price wars, and earning industry premiums through reliability and performance.

Micron’s valuation logic and peer comparison

Currently, Micron’s stock price is about $600, with a PE ratio of 21.44, and a market cap of approximately $650 billion. Wall Street investment banks’ 12-month target price range is $400 to $675, with an average close to $500, indicating overall undervaluation.

Historically, storage chips are a cyclical industry. Industry prosperity drives capacity expansion, which then leads to oversupply and price crashes. The market generally assigns a PE of 8 to 10. Now, Micron’s valuation has risen sharply, mainly because of HBM restructuring revenue streams: traditional DDR memory is highly affected by supply and demand fluctuations, while HBM adopts a lock-in production model, with long-term, irrevocable supply agreements signed with major clients like Nvidia before production. By 2026, all HBM capacity is sold out, shifting revenue from cyclical fluctuations to contractual stable income, and the market redefines it as an AI infrastructure supplier, raising valuation multiples.

Additionally, supported by policies and capital, as the only advanced storage manufacturer in the U.S., Micron benefits from the CHIPS Act and domestic supply chain trends. Institutional funds continue to allocate, giving it liquidity premiums.

In peer comparison, SK Hynix’s PE ratio is only 12.17, despite holding over half of the HBM market share and being a high-end Nvidia supplier. However, due to Korea’s chaebol governance structure, shareholder dividends and buybacks are relatively low; combined with nearly 40% of its regular DRAM capacity in Wuxi, China, restricted by export bans on overseas equipment, its production lines cannot upgrade to advanced nodes, risking capacity migration and asset devaluation, which suppresses valuation.

Samsung Electronics has a PE ratio of 34.18, not because of valuation premium, but due to declining net profit denominator. Samsung’s business covers storage, foundry, smartphones, and display panels. Its foundry business invests heavily in advanced process nodes but suffers from low yield, dragging down overall profit and supporting stable stock prices through domestic funds, thus pushing up PE ratios.

Institutions are optimistic about Micron’s core logic: increasing HBM revenue share boosts gross margin; long-term supply agreements lock in revenue certainty; capacity shifts toward HBM suppresses general DRAM supply, supporting price increases across the product line; after mass production of 1-gamma process, entering a capital expenditure return period, with free cash flow continuously improving. However, it should be noted that the storage industry cycle has not disappeared entirely—only smoothed by long-term HBM orders. If AI infrastructure investment slows or Samsung’s HBM4 surpasses in technology, the industry supply-demand landscape may be reshuffled again.

Core standards for HBM and the next-generation interconnect technology CXL

All manufacturers promote their own HBM product advantages, with three key parameters to evaluate HBM quality:

First is pin rate, which determines data transfer bandwidth. HBM relies on thousands of micro-bumps to connect with GPUs. The pin rate indicates the data transfer per channel per second. The industry standard bus width is fixed at 1024 pins, and total bandwidth follows a fixed conversion formula. Micron’s HBM3E is rated at 9.2Gbps, with a per-stack bandwidth of about 1.2TB/s, surpassing the mainstream 8.0 to 8.5Gbps. However, higher rates increase power consumption and risk signal distortion; frequent voltage toggling generates heat, and excessive rates can cause signal errors, affecting data transfer stability.

Second is energy efficiency, measured in pJ/bit; lower values mean better power control. Since HBM is integrated with GPU in a package, excessive power consumption worsens heat dissipation, forcing GPU downclocking. Micron’s reliance on 1-beta process low-voltage design results in about 30% higher energy efficiency than competitors, significantly reducing data center electricity and cooling costs.

Third is thermal resistance and packaging process, which are core competitive advantages for SK Hynix. Temperature rise is jointly determined by power consumption and thermal resistance. The multi-layer stacking structure of HBM makes heat conduction difficult; interlayer filling materials directly impact thermal resistance. Industry-standard processes are TC-NCF and MR-MUF. Micron and Samsung use TC-NCF, which can leave residual bubbles and have higher thermal resistance. SK Hynix’s MR-MUF process fills gaps with liquid filler, avoiding bubbles and achieving lower thermal resistance.

Higher thermal resistance leads to a chain reaction: increased ambient temperature accelerates capacitor leakage, forcing memory controllers to refresh data more frequently, occupying effective bandwidth; meanwhile, packaging process limits the maximum number of stacking layers. More layers mean greater mechanical stress and thermal expansion mismatch, increasing yield control difficulty exponentially.

When studying HBM technical data, focus on three points: rated test voltage, number of stacking layers and capacity per chip, and the core supply customers. Final validation of technical strength depends on customer testing and acceptance.

CXL: The next battlefield for AI cluster memory pooling

HBM addresses the internal bandwidth bottleneck of a single GPU. When AI clusters expand to thousands of GPUs, inefficient memory resource allocation and inconsistent cross-device caches become new pain points, giving rise to CXL technology.

Traditional data center memory is physically bound to a single server and cannot be shared across devices. This often results in some nodes’ KV cache being overwhelmed while others’ memory remains idle—a memory stranded problem with a failure rate of 20–30%, causing serious capital waste. Meanwhile, CPU and GPU cache data are not synchronized; traditional software synchronization introduces high latency and performance loss, and requires manual code adaptation, with low fault tolerance.

The root cause of these issues lies in the PCIe protocol, which is only suitable for large data transfers and lacks cache coherence mechanisms. CXL redefines the logic based on PCIe physical layer, optimizing memory semantics and cache coherence. It relies on hardware to automatically maintain cache state tags, completing data synchronization within nanoseconds without system or code intervention. Using a fixed FLIT transmission format simplifies data parsing, greatly reducing remote memory access latency to 170–250 nanoseconds.

Additionally, CXL can build shared memory pools via switches, breaking free from single-server physical binding, and dynamically allocating idle memory resources at microsecond granularity, solving the memory stranded problem thoroughly.

Micron has launched CXL Type 3 memory expansion modules based on self-developed DDR5 process, forming a high-low pairing with HBM: HBM focuses on ultra-high bandwidth and low latency for single cards; CXL emphasizes large-capacity memory expansion across nodes, supporting terabyte-level resource pooling. In industry deployment, hot data remains in local HBM, while long-term cold data is offloaded to CXL memory pools. Pre-fetching mechanisms hide transfer latency, enabling the deployment of models with over a million tokens of context length.

In terms of industry landscape, the HBM market competition is becoming increasingly fierce, while CXL memory expansion is still in early development stages, with an uncertain industry pattern. As a pure storage vendor with no legacy burden, Micron’s CXL modules use standard DDR5 technology, avoiding complex stacking packaging. Yield and capacity are controllable, giving it an early advantage in the race.

Underlying economic and frontier technological bottlenecks in the industry

The cost of advanced DRAM wafer fabs can reach $15–20 billion USD, with a single EUV lithography machine costing over $200 million USD. Coupled with power supply and cooling system investments, the daily amortized cost is enormous, requiring utilization rates above 95% to amortize manufacturing costs. When demand declines, manufacturers find it hard to cut production, only able to endure pressure and initiate price wars. This is the fundamental root of the storage industry’s cyclical nature.

Similarly, the high cost of HBM stems from physical constraints. Multi-layer DRAM chips are vertically stacked; any defect in a layer results in the entire module being scrapped. Yield declines exponentially with increased stacking layers. Even if single-chip yield is 95% and inter-layer bonding yield is 99%, the total yield of 8-layer HBM3E is only about 61%, and 12-layer HBM4 yields less than 50%. SK Hynix’s liquid packaging and Micron’s process yield improvements are fundamentally aimed at increasing overall yield and reducing unit costs. However, yield ramp-up and capacity expansion cannot be achieved overnight, making short-term HBM price drops unlikely.

The PIM (Processing-In-Memory) technology, proposed for over twenty years, has yet to see large-scale commercial application due to conflicting physical process requirements. DRAM transistors need low leakage and high threshold voltages to ensure charge storage, resulting in slower switching speeds; CPU and GPU logic chips pursue low threshold voltages and high switching frequencies, with higher leakage currents. Forcing computation units into DRAM would cause significant performance lag compared to GPUs and accelerate capacitor leakage due to heat, affecting data reliability.

The industry’s current compromise is to integrate lightweight AI compute units into the bottom base die of HBM, manufactured with TSMC’s advanced logic process, avoiding DRAM process constraints. However, this still has a considerable gap from true storage-compute integration.

Long-term, Micron’s core competitive logic is clear: leveraging 1-gamma process to lower the cost per bit, and using high-margin HBM to lock in pricing power, supported by long-term supply agreements to smooth industry cycles. Yet, the industry still faces structural bottlenecks: DRAM planar scaling approaching physical limits, and yield losses increasing with stacking layers, making short-term commercial breakthroughs in storage-compute integration unlikely. Future industry competition will no longer rely solely on leading process nodes but on comprehensive capabilities in yield engineering, packaging, and system integration—deeply built on decades of technological accumulation by memory giants.

It’s not hard to see that the chip industry’s iteration always falls into a cycle: insufficient computing power leads to larger chips, which constrain yield; switching to interconnect architectures introduces data transfer delays; stacking chips solves interconnect issues but creates heat dissipation problems, which further impair yield. Ultimately, the core of chip industry competition will return to materials science, with photonic interconnects, 2D semiconductor materials, and disruptive computing architectures potentially becoming key directions to break through existing physical limitations.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
GateSquareMayTradingShare
450.85K Popularity
#
BitcoinHoldsFirmAbove80K
94.31M Popularity
#
CryptoMarketRecovery
121.42K Popularity
#
AaveSuesToUnfreeze73MInETH
1.84M Popularity
#
DailyPolymarketHotspot
829.8K Popularity

Sitemap

From Elpida's downfall to Micron's rise: the reconstruction of the underlying logic of storage chips

The core bottleneck of AI computing: the memory wall as a common industry challenge

Micron’s fundamentals and positioning in the AI supply chain

DRAM: The underlying infrastructure behind AI computing power

Storage industry’s top three: Samsung, SK Hynix, Micron—differentiated competition

Micron’s valuation logic and peer comparison

Core standards for HBM and the next-generation interconnect technology CXL

CXL: The next battlefield for AI cluster memory pooling

Underlying economic and frontier technological bottlenecks in the industry

Trending Topics

GateSquareMayTradingShare

BitcoinHoldsFirmAbove80K

CryptoMarketRecovery

AaveSuesToUnfreeze73MInETH

DailyPolymarketHotspot

Pin