DeepSeek V4 scores a perfect 120 on Putnam 2025, with formal mathematical reasoning on par with Axiom

According to Beating Monitoring, DeepSeek V4 has announced two sets of formal mathematical reasoning evaluations. The Putnam (Putnam Competition) is the highest-level undergraduate mathematics competition in North America.

In practical scenarios, V4-Flash-Max scored 81.00 points on the Putnam-200 Pass@8 benchmark, using open-source tools LeanExplore and limited sampling. In comparison, Seed-2.0-Prover scored 35.50, Gemini 3 Pro and Seed-1.5-Prover both scored 26.50.

In frontier scenarios, V4 adopts a hybrid formal-informal reasoning approach, first generating candidate natural language solutions with informal reasoning, then filtering through self-verification, and finally completing rigorous proofs in Lean with a formal agent. V4 achieved a perfect score of 120/120 on Putnam-2025, tying for first place with Axiom, surpassing Seed-1.5-Prover’s 110/120 and Aristotle’s 100/120. Frontier scenarios utilize large-scale computational expansion, while practical scenario results better reflect routine deployment capabilities.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin