V4 scored a perfect 120 on Putnam 2025, with formal mathematical reasoning on par with Axiom

robot
Abstract generation in progress

ME News Report, April 24 (UTC+8), according to Beating Monitoring, V4 announced two sets of formal mathematical reasoning evaluations. Putnam (the Putnam Competition) is the highest-level undergraduate mathematics competition in North America. In the practical regime, V4-Flash-Max scored 81.00 points on the Putnam-200 Pass@8 benchmark, using open-source tools LeanExplore and limited sampling. In comparison, Seed-2.0-Prover scored 35.50, Gemini 3 Pro and Seed-1.5-Prover both scored 26.50. In the frontier regime, V4 adopts a hybrid formal-informal reasoning approach, first generating candidate natural language solutions with informal reasoning, filtering through self-verification, then completing rigorous proofs in Lean with a formal agent. V4 achieved a perfect score of 120/120 on Putnam-2025, tying for first place with Axiom, surpassing Seed-1.5-Prover’s 110/120 and Aristotle’s 100/120. The frontier regime employed large-scale computational expansion, while the practical regime results better reflect routine deployment capabilities. (Source: BlockBeats)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin