Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Pre-IPOs
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Promotions
AI
Gate AI
Your all-in-one conversational AI partner
Gate AI Bot
Use Gate AI directly in your social App
GateClaw
Gate Blue Lobster, ready to go
Gate for AI Agent
AI infrastructure, Gate MCP, Skills, and CLI
Gate Skills Hub
10K+ Skills
From office tasks to trading, the all-in-one skill hub makes AI even more useful.
GateRouter
Smartly choose from 30+ AI models, with 0% extra fees
Claude's Chinese tokens: Asking the same content costs 65% more tokens than English, while OpenAI only costs 15% more.
According to Beating Monitoring, AI researcher Aran Komatsuzaki translated Rich Sutton’s well-known paper “The Bitter Lesson” into 9 languages, and fed them into the tokenizer tools of six models—OpenAI, Gemini, Qwen, DeepSeek, Kimi, and Claude 6. Using the token count of the original English text on OpenAI’s tokenizer as a 1x baseline, he compared how many times more tokens each language used on each model. The results: when the same content was asked in Chinese to Claude, token consumption was 1.65 times the baseline; with OpenAI it was only 1.15 times. Hindi was even more extreme on Claude, exceeding the baseline by more than 3 times. Among the six cross-model tests, Anthropic came last.
Translation will change the length of the text, so the multiples compared with English are not completely precise. But more convincing is how the same Chinese paragraph performs across different models (still using the same baseline): Kimi used only 0.81 times (even fewer than English), Qwen used 0.85 times, and on Claude it rose to 1.65 times. The text is completely identical; the gap is purely an issue of the tokenizer’s efficiency. Chinese models process Chinese more efficiently than English, which shows the problem is not with Chinese itself, but whether the tokenizer has been optimized for that language.
For users, more tokens mean the API becomes directly more expensive, the wait time before the model responds is longer, and the context window is used up faster. The efficiency of the tokenizer depends on each language’s share in the training data: more English data means English words can be compressed more efficiently; less non-English data means the text can only be cut into smaller, more fragmented pieces. Aran’s conclusion: whoever has the bigger market, the more tokens they save.