ARC-AGI-3: Chollet's new benchmark reveals that contemporary AI fundamentally cannot adapt on the fly

robot
Abstract generation in progress

What Happened

François Chollet released ARC-AGI-3, a new benchmark for measuring progress in AGI.

Details

  • Chollet, the author of Keras, has been researching how to measure intelligence since 2019 with “On the Measure of Intelligence.” His core argument: Good benchmarks should expose the weaknesses of systems, rather than endorse existing claims.
  • ARC-AGI-3 includes an “interactive reasoning” test to see if systems can adjust based on common sense, trying and adapting in new situations.
  • The results are straightforward: human testers solved everything on the first try; the action efficiency of top AI models is below 1%.
  • This benchmark will continue to be updated: scores from previous versions jumped after improvements in model reasoning and coding abilities, so the benchmark must keep increasing the bar to reveal what’s still missing.

Humans vs. Current Models

Metric Humans Top AI Models
First Attempt Solving/Action Efficiency 100% <1%

Core Message: This is not a quantitative change that can be solved by fine-tuning; it is a fundamental lack of “on-the-spot adaptability.”

Why It Matters

  • If a system requires extensive preparation to complete tasks that humans can “see at a glance,” this is a fundamental issue for the AGI path: Are we measuring intelligence with the wrong metrics?
  • Chollet is not saying that current AI is poor, but rather: The information measured through memory and pattern matching built by scale is limited; benchmarks that can assess “real adaptability in new situations” are closer to what we care about.
  • For researchers and developers, the signal from ARC-AGI-3 is clear: Simply scaling up does not close this gap; learning and adaptation mechanisms need structural change.

Impact Assessment

  • Importance: High
  • Category: AI Research, Technical Insights, Industry Trends

Conclusion: This is an early but crucial signal, more valuable for researchers and builders—those who can innovate in learning and adaptation mechanisms will have the advantage; purely transactional approaches are less relevant in this direction.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin