According to CoinWorld, MiniMax disclosed the troubleshooting process for the M2 series large model's inability to output the name "Ma Jiaqi." The root cause is that the tokenizer merged "Jiaqi" into a single token, which the pre-training had seen before, but subsequent dialogue samples were extremely scarce, leading to high-frequency token updates being misaligned. As a result, surrounding vectors changed, causing the output of low-frequency tokens to be lost. A full scan of approximately 200k tokens showed a significant degradation of 4.9%, with Japanese being the most severe at 29.7%, Korean at 3.3%, Russian at 3.7%, Chinese at 3.9%, and English at 3.5%.

CoinNetwork

2026-05-09 07:07:30

Abstract generation in progress

Crypto界消息，MiniMax发布技术博客，披露其m2系列大型模型无法输出人名「马嘉祺」的根因排查过程。排查从一个个例出发，最终揭示了一个影响整个词表的系统性退化问题。根本原因是分词器在训练时将「嘉祺」合并成了一个独立的token。预训练阶段模型见过大量互联网文本，学会了这个token但后续训练的对话数据中，包含「嘉祺」的样本不到5条。后续训练过程中，tool_call标记、代码符号等高频token持续更新周围的向量空间，把「嘉祺」这类低频token挤到了错误的方向。模型仍然「认识」马嘉祺，能准确回答相关信息，丢失的只是输出这个token的能力。团队随后对约200ktoken的完整词表做了全量扫描，发现约4.9%的token发生了显著退化。退化最严重的是日语：29.7%的日语token显著退化，远超韩语3.3%、俄语3.7%、中文3.9%和英文3.5%。

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
GateSquareMayTradingShare
982.55K Popularity
#
BTCBackAbove80K
59.44M Popularity
#
JapanTokenizesGovernmentBonds
1.9M Popularity
#
DailyPolymarketHotspot
865.1K Popularity
#
WCTCTradingKingPK
747.16K Popularity

Sitemap

MiniMax: Root Cause Analysis of Why Large Models Cannot Output the Name "Ma Jiaqi"

Trending Topics

GateSquareMayTradingShare

BTCBackAbove80K

JapanTokenizesGovernmentBonds

DailyPolymarketHotspot

WCTCTradingKingPK

Pin