Aran translated "Bitter Lessons" into 9 languages to test various tokenization tools, using the number of tokens in the English original as a benchmark. The results show that the same Chinese text has significant differences in token counts across different models: Claude 1.65 times, OpenAI 1.15 times, Kimi 0.81 times, Qwen 0.85 times; Hindi exceeds 3 times with Claude, and Anthropics is the lowest. Conclusion: the more tokens, the more expensive; the level of language optimization by tokenization tools determines efficiency; languages with larger market share use fewer tokens.

BlockBeatNews

2026-04-29 08:22:06

Abstract generation in progress

According to Beating Monitoring, AI researcher Aran Komatsuzaki translated Rich Sutton’s well-known paper “The Bitter Lesson” into 9 languages, and fed them into the tokenizer tools of six models—OpenAI, Gemini, Qwen, DeepSeek, Kimi, and Claude 6. Using the token count of the original English text on OpenAI’s tokenizer as a 1x baseline, he compared how many times more tokens each language used on each model. The results: when the same content was asked in Chinese to Claude, token consumption was 1.65 times the baseline; with OpenAI it was only 1.15 times. Hindi was even more extreme on Claude, exceeding the baseline by more than 3 times. Among the six cross-model tests, Anthropic came last.

Translation will change the length of the text, so the multiples compared with English are not completely precise. But more convincing is how the same Chinese paragraph performs across different models (still using the same baseline): Kimi used only 0.81 times (even fewer than English), Qwen used 0.85 times, and on Claude it rose to 1.65 times. The text is completely identical; the gap is purely an issue of the tokenizer’s efficiency. Chinese models process Chinese more efficiently than English, which shows the problem is not with Chinese itself, but whether the tokenizer has been optimized for that language.

For users, more tokens mean the API becomes directly more expensive, the wait time before the model responds is longer, and the context window is used up faster. The efficiency of the tokenizer depends on each language’s share in the training data: more English data means English words can be compressed more efficiently; less non-English data means the text can only be cut into smaller, more fragmented pieces. Aran’s conclusion: whoever has the bigger market, the more tokens they save.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
WCTCTradingKingPK
367.4K Popularity
#
CryptoMarketsDipSlightly
272.03K Popularity
#
DailyPolymarketHotspot
701.39K Popularity
#
StrategyAccumulates2xMiningRate
139.47M Popularity
#
TapAndPayWithGateCard
24.52K Popularity

Sitemap

Claude's Chinese tokens: Asking the same content costs 65% more tokens than English, while OpenAI only costs 15% more.

Trending Topics

WCTCTradingKingPK

CryptoMarketsDipSlightly

DailyPolymarketHotspot

StrategyAccumulates2xMiningRate

TapAndPayWithGateCard

Pin