This episode of All In Podcast is packed with information. Here are some highlights that I found most worth sharing.

Let's start with the Chinese open-source model thread. Progress is faster than expected.

Zhipu AI released its next-generation frontier-level open-source model GLM 5.2, with 744 billion parameters and a 1M token context window, fully under the MIT open-source license. The benchmark results are striking: it beats GPT-5.5 in software engineering coding benchmarks and trails only Anthropic's top-tier Claude Opus 4.8 by less than 1%, yet its API price is a full 85% cheaper than comparable American models.

There was an interesting detail in the show: one method Chinese teams use to accelerate their catch-up is to build farms with tens of thousands of phones and iPads, using encrypted accounts to hit the APIs of top U.S. frontier models with high density, harvesting the inference traces from those models, and feeding them back into their own open-source models for reinforcement training. It's essentially using the gold-standard answers that U.S. labs spent huge sums to produce as cheat sheets, achieving comparable performance at very low cost.

Sacks had a sharp take on this: he criticized Anthropic's Dario for previously pushing the U.S. government to set up tedious security approval processes, arguing that such self-imposed restrictions actually slowed down America's own pace. The Fable model was taken down due to jailbreak allegations, and approval for OpenAI's new model has also been a struggle. His assessment is that Chinese models currently lag by about 9 months in terms of technology and about 24 months in chips, but they have already trained the entire GLM5 family on domestic chips like Huawei's Ascend. In the future, these cheap and effective "AI boxes" optimized for local chips could be dumped into the global market at low prices, while the U.S., by setting various restrictions, is effectively ceding this trillion-dollar export market.

On Micron's recent earnings report, the show gave a precise assessment: DRAM is the true bottleneck for the entire AI wave.

Micron's quarterly revenue surged fourfold year-over-year, from $9 billion to $42 billion, with guidance far exceeding expectations. HBM capacity for 2026 has already been fully sold out.

One point made in the show was blunt: previously, people were hunting for obscure Japanese component suppliers on Twitter as "bottleneck stocks," but the real lifeline is only DRAM, especially HBM. The reason is simple: memory bandwidth and capacity set the physical ceiling for all large model inference performance—it's a hard constraint that can't be bypassed. It was even mentioned that the super factory Musk is building also centers its core technology on DRAM, not fiber optics, power supplies, or NAND flash.

Micron also made an interesting change to its business model this time: it signed long-term supply agreements with core cloud providers that include price floor and ceiling protections, locking in 50% of future revenue. This means that even if the industry cycle turns downward, the minimum contract price will still be higher than the peak gross margin of any previous cycle.

On barriers to entry: although China's CXMT is preparing for an IPO and may in the future use low-priced mid- to low-end consumer memory to ease cost pressure for major companies like Apple, in the top-tier HBM segment required for AI servers, only three companies globally—Micron, SK Hynix, and Samsung—can currently produce it due to extremely high process difficulty, which cannot be caught up in the short term.

The show offered a fairly bold prediction: next year, 30% to 40% of global hyperscale capital expenditures will go directly to DRAM chip vendors. This cost surge has already forced Apple to raise retail prices across the MacBook and Mac Studio lineups.

The edge computing and distributed inference segment is the most imaginative part of this episode. Here are a few ideas I found intriguing.

Tesla filed a trademark for "Megapod" on June 18. The underlying physical logic: building a 1 GW data center on the ground involves extremely lengthy approval processes for land, energy, and liquid cooling. The Megapod concept integrates GPU, battery network, and cooling systems into a containerized modular data center that can be air-dropped directly into Tesla's already-approved Supercharger station network—where existing grid connections and idle land are already available—thus bypassing the biggest bottlenecks in traditional data center construction: approval and power access.

The logic of distributed inference is also interesting: model inference can be split into two stages—the Prefill stage for understanding the question and the Decode stage, which is bandwidth- and memory-intensive. Large capital can acquire depreciated old GPUs, add front-end chips specifically optimized for decoding, and form lower-cost distributed inference networks.

An even crazier idea: in the future, consumers who buy Powerwall home batteries could be offered discounts, with every battery containing a built-in AI chip paired with Starlink satellite connectivity. When the battery is idle, it automatically forms a massive distributed P2P inference pool, providing a continuous stream of nearly free offshore compute power. If this vision ever materializes, it would be a dimensionality reduction attack on traditional cloud giants.

The wildest part is space compute. Building a 1 GW data center on the ground costs $35 billion in chips plus $25 billion in cooling and labor, along with various land-use disputes. But with SpaceX's Starship achieving full reusability, the cost of launching 1 GW of compute into orbit via laser interconnects could drop to just $5 billion. The naturally freezing temperatures of space and near-unlimited solar energy could make the operating economics of a space data center surpass those of terrestrial data centers within three to four years.

DRAM-4.91%

SKHYNIX-8.97%

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
Get2SharesOfSKHynixAtZeroCost
1.63M Popularity
#
MicronOvertakesMetaInMarketValue
483.41K Popularity
#
WorldCup🇿🇦vs🇨🇦
1.31M Popularity
#
USMayPCEInflationRisesTo4.1%HighestIn3Years
596.92K Popularity
#
StakeUSD1Earn9.48%APR
1M Popularity

Pinned

Sitemap

All In Podcast latest episode: how the panelists view Micron, memory bottlenecks, Chinese open-source models, and distributed inference.

Trending Topics

Get2SharesOfSKHynixAtZeroCost

MicronOvertakesMetaInMarketValue

WorldCup🇿🇦vs🇨🇦

USMayPCEInflationRisesTo4.1%HighestIn3Years

StakeUSD1Earn9.48%APR

Pinned