MiMo-V2-Pro base model has 1 trillion parameters, training uses thousands of GPUs, aiming to approach Claude Opus 4.6 and secure a ticket for the next stage of Agent competition. Technologically, it pushes global attention and sliding window attention to an extreme sparsity ratio of 7:1, controlling the inference cost for long texts, and continues to use the MTP architecture to accelerate inference. On the management side, the MiMo team, consisting of about a hundred people, has only thirty to forty directly involved in core iterations, lacking clear job levels and explicit delivery deadlines. When encountering sudden jumps in training loss, they will stop training immediately for troubleshooting, which may result in downtime of one or two weeks and cost millions of computational power.

MeNews

2026-04-24 06:01:33

Abstract generation in progress

ME News Report, April 24 (UTC+8), according to Beating Monitoring, Xiaomi’s large model team leader Luo Fuli disclosed in her first in-depth interview that the MiMo-V2-Pro model base has a total parameter count of 1 trillion, trained using thousands of GPUs. She believes that a 1 trillion scale is the minimum requirement to achieve performance close to Claude Opus 4.6 and to secure a spot in the next stage of Agent competition.
On the technical level, the Pro version pushes the ratio of global attention to sliding window attention to an extreme sparse ratio of 7:1, controlling inference costs for long texts while expanding parameter size, and continues to use the MTP (Multi-Token Prediction) architecture to accelerate inference with surplus computing power.
On the management side, within the hundred-person MiMo team, only about thirty to forty people are directly involved in core iterations. The team has no hierarchical ranks, no clear subgroup divisions, and no fixed delivery deadlines. When encountering unstable numerical issues such as sudden jumps in training loss, the team opts to stop training directly for troubleshooting, even if it means halting for a week or two and spending millions of computational costs.
(Source: BlockBeats)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
WCTCTradingKingPK
135.36K Popularity
#
CryptoMarketSeesVolatility
205.93K Popularity
#
rsETHAttackUpdate
59.76K Popularity
#
US-IranTalksStall
163.4K Popularity
#
ETHMemeCoinFLORKSurges
32.32K Popularity

Sitemap

Xiaomi reveals training details of the 1T model MiMo-V2-Pro: utilizing thousands of calories, no rank, no deadline

Trending Topics

WCTCTradingKingPK

CryptoMarketSeesVolatility

rsETHAttackUpdate

US-IranTalksStall

ETHMemeCoinFLORKSurges

Pin