According to Beating Monitoring, Google DeepMind released an AI co-mathematician, a multi-agent interactive research workstation for mathematicians. The system achieved a 47.9% accuracy rate on the currently most difficult research-level math benchmark, FrontierMath Tier 4 (solving 23 out of 48 problems), directly surpassing the previous record of 39.6% set by GPT-5.5 Pro. This system did not use a new generation base model; it directly used Gemini 3.1 Pro. When running Tier 4 alone, this model achieved only 19%, but with the addition of the agent framework, its performance more than doubled. DeepMind built a multi-layer architecture for it: at the top, a “project coordinator” breaks down research tasks into multiple workflows, which are then distributed to sub-agents responsible for literature retrieval, coding, and reasoning. The proofs generated must pass a review process conducted by multiple “review agents” before submission. This heavy scaffolding demonstrates that, in top-tier mathematical reasoning, the ability gains from orchestration may be greater than those from model upgrades. The blind test was conducted by Epoch AI, and to prevent cheating, the DeepMind team could not see the questions throughout the process, with each problem allowed 48 hours to run. The results not only topped the leaderboard but also solved three problems that all previous models had failed to solve. Although called an assistant, it functions more like a brainstorming colleague. Group theory expert Marc Lackenby used it in actual research to solve a public conjecture in the Kourovka Notebook. Interestingly, the system’s initial strategy was flagged as “flawed” by its own review agent, but Lackenby recognized the clever idea hidden in the flawed approach, filled in the gaps himself, and ultimately completed the proof. Currently, the AI co-mathematician is only open for internal testing to a small number of mathematicians.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
GateSquareMayTradingShare
994.65K Popularity
#
BTCBackAbove80K
59.44M Popularity
#
JapanTokenizesGovernmentBonds
1.9M Popularity
#
DailyPolymarketHotspot
866.36K Popularity
#
WCTCTradingKingPK
750.05K Popularity

Sitemap

DeepMind releases AI math research assistant: Multi-agent architecture beats GPT-5.5Pro and also solves previously "unsolvable" problems

Trending Topics

GateSquareMayTradingShare

BTCBackAbove80K

JapanTokenizesGovernmentBonds

DailyPolymarketHotspot

WCTCTradingKingPK

Pin