A reasoning monster bred from 340k trajectories—the name SU-01 is appropriately low-key.

View Original
MeNews
The post-training inference model SU-01 achieves gold medal performance in Olympiad-level questions
AIMPACT proposes a systematic approach to converting post-training inference models into Olympic-level problem solvers, in three steps: supervised fine-tuning with a reverse perplexity curriculum to train proof search and self-checking; then extending it with two-stage reinforcement learning; and finally applying scale-adjusted enhancement during testing. Applied to the 30B-A3B backbone, it uses approximately 340k sub-8K trajectories for supervised fine-tuning, followed by 200 steps of RL, yielding SU-01. The model can perform stable reasoning on hard problems; its trajectories exceed 100k tokens. It achieves gold-medal-level performance in competitions such as IMO, USAMO, and IPhO, and demonstrates scientific reasoning generalization across fields beyond mathematics and physics.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned