V4 announces two sets of formal reasoning evaluations. In practical scenarios, V4-Flash-Max scores 81.00 points on Putnam-200 Pass@8, outperforming Seed-2.0-Prover, Gemini 3 Pro, and Seed-1.5-Prover. Cutting-edge scenarios use hybrid reasoning: first generating natural language explanations and self-verifying, then completing rigorous proofs with Lean. Putnam-2025 scores a perfect 120/120, tying for first place with Axiom, ahead of Seed-1.5-Prover and Aristotle.

MeNews

2026-04-24 05:23:20

Abstract generation in progress

ME News Report, April 24 (UTC+8), according to Beating Monitoring, V4 announced two sets of formal mathematical reasoning evaluations. Putnam (the Putnam Competition) is the highest-level undergraduate mathematics competition in North America. In the practical regime, V4-Flash-Max scored 81.00 points on the Putnam-200 Pass@8 benchmark, using open-source tools LeanExplore and limited sampling. In comparison, Seed-2.0-Prover scored 35.50, Gemini 3 Pro and Seed-1.5-Prover both scored 26.50. In the frontier regime, V4 adopts a hybrid formal-informal reasoning approach, first generating candidate natural language solutions with informal reasoning, filtering through self-verification, then completing rigorous proofs in Lean with a formal agent. V4 achieved a perfect score of 120/120 on Putnam-2025, tying for first place with Axiom, surpassing Seed-1.5-Prover’s 110/120 and Aristotle’s 100/120. The frontier regime employed large-scale computational expansion, while the practical regime results better reflect routine deployment capabilities. (Source: BlockBeats)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
WCTCTradingKingPK
156.22K Popularity
#
CryptoMarketSeesVolatility
221.48K Popularity
#
rsETHAttackUpdate
68.05K Popularity
#
US-IranTalksStall
176.56K Popularity
#
ETHMemeCoinFLORKSurges
35.88K Popularity

Sitemap

V4 scored a perfect 120 on Putnam 2025, with formal mathematical reasoning on par with Axiom

Trending Topics

WCTCTradingKingPK

CryptoMarketSeesVolatility

rsETHAttackUpdate

US-IranTalksStall

ETHMemeCoinFLORKSurges

Pin