AIMPACT proposes a three-step method to turn post-training inference models into Olympiad-level problem solvers: reverse perplexity curriculum fine-tuning, two-stage reinforcement learning, and test-time expansion enhancement. Using 30B-A3B as the backbone to train SU-01, with trajectories exceeding 100k tokens, achieving gold medal levels in competitions such as IMO/USAMO/IPhO, and demonstrating cross-domain scientific reasoning generalization. Source: InFoQ

MeNews

2026-05-15 21:43:33

Abstract generation in progress

AIMPACT News, May 16 (UTC+8), a new paper proposes a systematic approach to convert post-training inference models into Olympic-level problem solvers and trains the SU-01 model based on this method.
The approach includes three steps: first, supervised fine-tuning using a reverse perplexity curriculum to instill strict proof search and self-checking behaviors;
then, extending these behaviors through two-stage reinforcement learning (transitioning from verifiable reward reinforcement learning to proof-level reinforcement learning);
finally, improving performance through scaling during testing.
The research team applied the method to a 30B-A3B backbone model, using approximately 340k sub-8K token trajectories for supervised fine-tuning, followed by 200 steps of reinforcement learning to obtain SU-01.
This model can perform stable reasoning on difficult problems, with trajectory lengths exceeding 100k tokens, achieving gold medal levels in competitions such as IMO 2025/USAMO 2026 and IPhO 2024/2025, and demonstrating generalization capabilities beyond mathematics and physics in scientific reasoning. (Source: InFoQ)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
GateSquareMayTradingShare
1.85M Popularity
#
CLARITYActPassesSenateCommittee
3.39M Popularity
#
DailyPolymarketHotspot
956.81K Popularity
#
BitcoinVShapedReversalBack
227.02M Popularity
#
WCTCTradingKingPK
804.18K Popularity

Pinned

Sitemap

The post-training inference model SU-01 achieves gold medal performance in Olympiad-level questions

Trending Topics

GateSquareMayTradingShare

CLARITYActPassesSenateCommittee

DailyPolymarketHotspot

BitcoinVShapedReversalBack

WCTCTradingKingPK

Pinned