MLE Bench 66.6%—close to Gemini 3.1. Hitting this level with 9.8B parameters per token is impressive. The details of the windowed FIFO and the prefix-tree merging are well worth a close read. And as for training long sequences, MiniMax has effectively gnawed its way through it—really tackling it head-on.

View Original
BlockBeatNews
Decoding the hidden card: MiniMax releases the M2 technical report, detailing the MoE base and Agent training system
This article reviews MiniMax's M2 series technical reports, describing the trade-offs from M1's hybrid linear attention to full attention, as well as MTP, Sigmoid routing, and Forge in reducing costs on training and inference. It is the first disclosure of the Forge and M2.7 self-evolution mechanism for long sequence Agent RL, which uses windowed FIFO and prefix tree merging, achieving up to a 40-fold speedup in training on long sequences. The self-evolution closed loop of M2.7 can complete over 100 rounds of analysis, code modification, evaluation, and rollback, improving by approximately 30%. With a single token of 9.8B parameters, SWE-Pro achieves 56.22%, MLE Bench 66.6%, approaching Gemini 3.1.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned