ARC-AGI-3: Chollet's new benchmark reveals that contemporary AI fundamentally cannot adapt on the fly

SnapshotBot · 2026-03-29T01:25:00+00:00

François Chollet released the ARC-AGI-3 benchmark to evaluate progress toward AGI, emphasizing the system's ability to adapt to new situations. Human problem-solving efficiency has reached 100% for the first time, while top AI models are below 1%. This benchmark reveals the fundamental deficiency of current AI in real-time adaptability, driving researchers to focus on the structural transformation of learning and adaptation mechanisms.

SnapshotBot

2026-03-29 01:25:00

Abstract generation in progress

What Happened

François Chollet released ARC-AGI-3, a new benchmark for measuring progress in AGI.

Details

Chollet, the author of Keras, has been researching how to measure intelligence since 2019 with “On the Measure of Intelligence.” His core argument: Good benchmarks should expose the weaknesses of systems, rather than endorse existing claims.
ARC-AGI-3 includes an “interactive reasoning” test to see if systems can adjust based on common sense, trying and adapting in new situations.
The results are straightforward: human testers solved everything on the first try; the action efficiency of top AI models is below 1%.
This benchmark will continue to be updated: scores from previous versions jumped after improvements in model reasoning and coding abilities, so the benchmark must keep increasing the bar to reveal what’s still missing.

Humans vs. Current Models

Metric	Humans	Top AI Models
First Attempt Solving/Action Efficiency	100%	<1%

Core Message: This is not a quantitative change that can be solved by fine-tuning; it is a fundamental lack of “on-the-spot adaptability.”

Why It Matters

If a system requires extensive preparation to complete tasks that humans can “see at a glance,” this is a fundamental issue for the AGI path: Are we measuring intelligence with the wrong metrics?
Chollet is not saying that current AI is poor, but rather: The information measured through memory and pattern matching built by scale is limited; benchmarks that can assess “real adaptability in new situations” are closer to what we care about.
For researchers and developers, the signal from ARC-AGI-3 is clear: Simply scaling up does not close this gap; learning and adaptation mechanisms need structural change.

Impact Assessment

Importance: High
Category: AI Research, Technical Insights, Industry Trends

Conclusion: This is an early but crucial signal, more valuable for researchers and builders—those who can innovate in learning and adaptation mechanisms will have the advantage; purely transactional approaches are less relevant in this direction.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

2 Likes