DeepSeek V4 Achieves Perfect Score of 120 at Putnam-2025, Matching Axiom in Formal Mathematical Reasoning

According to monitoring by Dongcha Beating, DeepSeek V4 has released two sets of formal mathematical reasoning evaluations. The Putnam Competition is the highest-level undergraduate mathematics competition in North America. In the Practical Regime, V4-Flash-Max scored 81.00 points on the Putnam-200 Pass@8 benchmark, utilizing the open-source tool LeanExplore and constrained sampling. In comparison, Seed-2.0-Prover scored 35.50, while both Gemini 3 Pro and Seed-1.5-Prover scored 26.50. In the Frontier Regime, V4 employed a hybrid formal-informal reasoning approach, initially generating candidate natural language solutions through informal reasoning, which were then filtered through self-validation before being rigorously proven by a formal agent in Lean. V4 achieved a perfect score of 120/120 at Putnam-2025, tying for first place with Axiom, and surpassing Seed-1.5-Prover’s score of 110/120 and Aristotle’s score of 100/120. The Frontier Regime utilized large-scale computational extensions, while the results in the Practical Regime better reflect conventional deployment capabilities.

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin