The Common Hidden Card of Warren Buffett: Top 10 Bottlenecks in Semiconductors



1. Interconnect (Copper Cable Electrical Interconnect)
The immediate shortcoming restricting cluster efficiency. High-speed copper cables like NVLink are feasible over short distances (within cabinets), but as transmission speeds approach 112Gbps PAM4, skin effect and crosstalk cause rapid signal attenuation, compressing effective transmission distance to under 1 meter. This means the physical topology of GPU clusters is "locked," and scaling faces serious impedance.

2. Photonics (Optical Interconnect)
A physical alternative to copper cables. Optical signals have significant advantages in power consumption and bandwidth density over long distances (across cabinets, data centers), but the current bottleneck is in the optoelectronic conversion (O-E-O) stage—converting electrical signals to optical signals requires lasers, modulators, and detectors. These III-V compound semiconductors are far less mature in manufacturing and integration than CMOS, with slow capacity ramp-up.

3. EDA (Electronic Design Automation)
A mapping tool for chip complexity. For processes below 3nm, EDA must handle quantum effects modeling and process variation, causing computational demands to jump from quadratic to exponential. The global market is essentially monopolized by two giants due to long-term ecosystem barriers in databases and process libraries, making it very difficult for startups to break through, resulting in tool iteration speeds lagging behind chip design needs.

4. Advanced Packaging (CoWoS/EMIB)
The physical assembly platform for compute chips. The bottleneck is not technology but the capacity of the silicon interposer layer. Producing interposers requires occupying mature process (65nm) wafer fab capacity, which has long been dominated by CMOS image sensors and other mature chips. Expansion cycles take 12-18 months, directly causing GPU and HBM to be "chipless and bridgeless."

5. Power Conversion (Voltage Regulation Modules)
The "translation layer" between power grid and chips. Power is stepped down from high-voltage AC to about 1V DC on chips, requiring multi-stage DC-DC conversion. Traditional silicon MOSFETs have high switching losses under low voltage and high current, capping efficiency at 90%-92%. In data centers with hundreds of megawatts, each 1% efficiency gain saves millions of kWh annually, but SiC/GaN devices are severely limited by substrate size and quality.

6. Cooling (Liquid Cooling)
A hard constraint of thermodynamics. Air cooling limits heat flux density to about 50W/cm², but NVIDIA's B200 chips already have hotspots exceeding 100W/cm². Liquid cooling shifts toward immersion or cold plate systems, but bottlenecks include dielectric properties of cooling liquids and pipeline sealing reliability—retrofitting data centers involves civil and fire safety regulations, making deployment from zero to one very long.

7. New Materials (Substrate Substitutes)
Attempts to overturn fundamental physical properties. This is not a single field but multiple breakthroughs targeting the bottlenecks above: GaN/SiC for power conversion, InP for photonics transceivers, synthetic diamonds (with thermal conductivity five times that of copper) for packaging and heat dissipation, glass substrates for large-scale packaging warping. Each material line's purification process (e.g., diamond wafer vapor deposition) and heterogeneous integration (how to combine with silicon) are long engineering challenges.

8. Memory (HBM/DRAM/NAND)
The "vascular system" feeding data to compute power. HBM relies on TSVs (Through-Silicon Vias) and micro-bumps, with yields far lower than standard DRAM. AI training is shifting from HBM shortages to bandwidth issues in DRAM and SSD capacity, meaning the entire storage system's manufacturing capacity (especially the capital expenditure pace of Korean manufacturers) cannot keep up with the exponential growth of large model parameters.

9. Helium
The "blood" of wafer fabs. Core equipment like lithography, etching, and vapor deposition require high-purity helium as carrier or cooling medium. Helium is sourced from natural gas associated gases, with over 90% supplied by the US, Qatar, and Russia, and it is non-renewable. Supply disruptions not only impact advanced processes but also cause yield drops in mature processes.

10. Power
The absolute ceiling of everything above. Grid expansion involves transformers, high-voltage corridors, and grid connection approvals, typically taking 3-5 years. AI clusters have huge instantaneous power fluctuations (e.g., during synchronized gradient updates in training), posing severe challenges to grid peak regulation. Without redundant power capacity, even if chips, packaging, and liquid cooling are ready, cabinets #0成本拿2股SK海力士 cannot be powered on.
View Original
post-image
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments