Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
A paper stirs a trillion-dollar market; the storage chip industry is collapsing...
No one expected that this Wednesday, when the U.S. stock market opened, the storage chip sector would encounter a "black day," with giant stocks all turning green—
By the close, Micron Technology fell 4%, Western Digital dropped 4.4%, Seagate declined 5.6%, and SanDisk was hit even harder with a 6.5% plunge.
The trigger for this sell-off earthquake was none other than Google's release of the TurboQuant compression algorithm.
As is well known, when large models run, KV caches are essentially "gold-eating beasts" in memory.
To avoid recalculating previous tokens, LLMs maintain a "working memory," which grows rapidly like a snowball as the conversation lengthens.
Google's TurboQuant offers an extremely "brutal" slimming solution:
First, it performs a "rotation" on the high-dimensional vectors in the KV cache, then switches to a different coordinate system to describe them, reducing memory overhead to zero.
Next, it uses just 1-bit of extra space to insert a mathematical "calibrator" that precisely eliminates the systemic bias introduced by compression.
The TurboQuant paper will be officially published at ICLR 2026 next month.
The results are outstanding: without any retraining, TurboQuant compresses the cache down to an astonishing 3 bits.
This reduces KV cache overhead by a factor of 6, and importantly, inference performance is almost unaffected.
On the H100, compared to the 32-bit baseline, 4-bit computation speeds up attention processing by 8 times. Not only does it save space, but it also runs faster.