Abstract: The interview with Luo Fuli outlines Xiaomi's MiMo-V2-Pro, a trillion-parameter, GPU-heavy model targeting Claude Opus 4.6-level capabilities. It employs extreme sparse attention with MTP; operational risks include a lean, under-structured team halting training when losses jump, incurring large costs.MiMo-V2-Pro reportedly has 1T parameters trained on thousands of GPUs, aiming for Claude Opus 4.6-level performance. It uses extreme sparse attention (7:1) with MTP; a small team halts unstable training to troubleshoot, risking millions in cost.

AirdropBlackHole

2026-04-24 06:31:18

Abstract generation in progress

According to monitoring by Dongcha Beating, Luo Fuli, head of Xiaomi’s large model team, disclosed in her first in-depth interview that the MiMo-V2-Pro model base has a total parameter count of 1T, utilizing thousands of GPUs for training. She believes that a scale of 1T is the baseline for achieving performance close to Claude Opus 4.6 and securing entry into the next phase of agent competition. On a technical level, the Pro version pushes the ratio of global attention to sliding window attention to an extreme sparse ratio of 7:1, controlling the reasoning cost for long texts while expanding the parameter count, and continues to use the MTP (Multi-Token Prediction) architecture to leverage excess computing power for accelerated inference. On the management side, only about 30 to 40 out of the hundred-member MiMo team are directly involved in core iterations, with no established job levels, clear group divisions, or delivery deadlines. When encountering unstable numerical issues such as sudden changes in training loss, the team opts to halt training for troubleshooting, even if it means stopping for one or two weeks and incurring millions in computing costs.

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
WCTCTradingKingPK
154.1K Popularity
#
CryptoMarketSeesVolatility
219.2K Popularity
#
rsETHAttackUpdate
67.18K Popularity
#
US-IranTalksStall
172.8K Popularity
#
ETHMemeCoinFLORKSurges
35.65K Popularity

Sitemap

Xiaomi Reveals Training Details of 1T Model MiMo-V2-Pro: Thousands of GPUs Used, No Job Levels or Deadlines

Trending Topics

WCTCTradingKingPK

CryptoMarketSeesVolatility

rsETHAttackUpdate

US-IranTalksStall

ETHMemeCoinFLORKSurges

Pin