Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Pre-IPOs
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Ramp Labs proposes a new multi-agent memory sharing solution, reducing token consumption by up to 65%
ME News, April 11 (UTC+8). AI infrastructure company Ramp Labs released its research findings, “Latent Briefing.” The report enables efficient memory sharing among multi-agent systems by directly compressing large model KV caches, significantly reducing token consumption without sacrificing accuracy.
In mainstream multi-agent architectures, the Orchestrator breaks down tasks and repeatedly calls on the Worker models; as the reasoning chain keeps getting longer, token usage grows exponentially.
The core idea of Latent Briefing is to use attention mechanisms to identify the truly critical parts in the context, then discard redundant information directly at the representation layer—rather than relying on slow LLM summaries or RAG retrievals with unstable performance.
In the LongBench v2 benchmark, the method performed exceptionally well: Worker model token consumption decreased by 65%, the median token savings for medium-length documents (from 32k to 100k) reached 49%, overall accuracy improved by about 3 percentage points compared with the baseline, and each compression added only about 1.7 seconds of additional latency—roughly a 20× speedup over the original algorithm.
The experiments used Claude Sonnet 4 as the Orchestrator and Qwen3-14B as the Worker model, covering a variety of document scenarios including academic papers, legal documents, novels, and government reports.
The study also found that the optimal compression threshold varies depending on task difficulty and document length: difficult problems are better suited to more aggressive compression to filter speculative reasoning noise, while long documents are more suitable for lighter compression to preserve dispersed key information. (Source: BlockBeats)