Futures
Access hundreds of perpetual contracts
CFD
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
CFD
U.S. stock CFD derivatives
US Stocks
Access real US stocks and ETFs
HK Stocks
Trade quality Hong Kong-listed stocks
Korean Stocks
SK Hynix
Real Korean stocks and top assets
Stock Futures
High leverage, 24/7 trading
Tokenized Stocks
Backed by real stock assets
IPO Access
Unlock full access to global stock IPOs
GUSD
Mint GUSD for Treasury RWA yields
Stocks Activities
Trade Popular Stocks and Unlock Generous Airdrops
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
IPO Access
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Promotions
AI
Gate AI
Your all-in-one conversational AI partner
Gate AI Bot
Use Gate AI directly in your social App
GateClaw
Gate Blue Lobster, ready to go
Gate for AI Agent
AI infrastructure, Gate MCP, Skills, and CLI
Gate Skills Hub
10K+ Skills
From office tasks to trading, the all-in-one skill hub makes AI even more useful.
Google Pixel deploys zero-copy MTP, Gemini Nano inference speeds up by over 50% and saves memory.
According to Beating's monitoring, Google has deployed a Multi-Token Prediction (MTP) architecture in the Pixel 9 and Pixel 10 series devices, directly accelerating the built-in Gemini Nano v3 model. By attaching a lightweight Transformer prediction head to the tail of the frozen main model, the new architecture improves on-device inference speed by more than 50% while fully retaining the original safety alignment and output quality.
Traditional speculative decoding requires running an independent draft model to predict candidate tokens. This not only occupies additional runtime memory on the phone, but also limits prediction accuracy because the independent model cannot access the internal hidden states of the main model.
The new architecture embeds the MTP head at the tail of the frozen main model, successfully reusing the feature activations already computed by the main model, significantly improving the prediction accuracy of candidate tokens.
To avoid redundant runtime memory overhead from draft computation during autoregressive generation, Google designed a zero-copy mechanism. In traditional solutions, the draft model needs to maintain an independent key-value (KV) cache when generating candidate tokens, while the zero-copy mechanism allows the attached prediction head to directly read the main model's existing cache through cross-attention. This not only eliminates the startup latency of draft prediction, but also saves about 130MB of runtime memory for the phone.
In practical Pixel tasks such as notification summaries and text proofreading, the MTP architecture enables the model to successfully predict nearly 2 additional tokens per inference on average, reducing the frequency of the main processor being woken up for verification, thus saving system power. In highly structured text generation tasks like smart replies, the token acceptance rate is improved by up to 55%.