A paper stirs a trillion-dollar market; the storage chip industry is collapsing...



No one expected that this Wednesday, when the U.S. stock market opened, the storage chip sector would encounter a "black day," with giant stocks all turning green—

By the close, Micron Technology fell 4%, Western Digital dropped 4.4%, Seagate declined 5.6%, and SanDisk was hit even harder with a 6.5% plunge.

The trigger for this sell-off earthquake was none other than Google's release of the TurboQuant compression algorithm.

As is well known, when large models run, KV caches are essentially "gold-eating beasts" in memory.

To avoid recalculating previous tokens, LLMs maintain a "working memory," which grows rapidly like a snowball as the conversation lengthens.

Google's TurboQuant offers an extremely "brutal" slimming solution:

First, it performs a "rotation" on the high-dimensional vectors in the KV cache, then switches to a different coordinate system to describe them, reducing memory overhead to zero.

Next, it uses just 1-bit of extra space to insert a mathematical "calibrator" that precisely eliminates the systemic bias introduced by compression.

The TurboQuant paper will be officially published at ICLR 2026 next month.

The results are outstanding: without any retraining, TurboQuant compresses the cache down to an astonishing 3 bits.

This reduces KV cache overhead by a factor of 6, and importantly, inference performance is almost unaffected.

On the H100, compared to the 32-bit baseline, 4-bit computation speeds up attention processing by 8 times. Not only does it save space, but it also runs faster.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin