BlockBeats states that Prime Intellect conducts a two-week autonomous AI research, with Codex and Claude Code self-iterating in the nanoGPT speed race to achieve validation loss in the fewest steps. After approximately 10k experiments and 14k hours of computation, Opus set a new record with 2,930 steps (human 2,990 steps). However, experiments reveal the boundaries of AI agents: in branches requiring new algorithms, neither can propose ideas without relying on existing human code or papers. Breakthroughs depend on massive combinations and scans of open-source technologies. Claude often violates autonomous operation and stops itself during long tasks; Codex, while capable of running all day, easily falls into infinite loops and exhaustively searches the same hyperparameter space for extended periods. Conclusion: cutting-edge models still require humans to provide clues for algorithmic innovation.

MeNews

2026-05-22 22:18:37

Abstract generation in progress

AIMPACT News, May 15 (UTC+8), according to Beating Monitoring, Prime Intellect announced a two-week autonomous AI research experiment. The research team had Codex (gpt 5.5 xhigh) and Claude Code (opus 4.7 xhigh) independently iterate optimizer solutions in the nanoGPT speed race, attempting to reach the target validation loss with the fewest steps. After approximately 10k experiments and 14k hours of H200 computing power, Opus finally broke the human record of 2,990 steps with 2,930 steps. The experiment revealed the current capabilities and limitations of AI agents. In the test branch requiring the development of new algorithms, both models failed to generate any ideas without relying on existing code or papers from human communities. Their record-breaking results depended entirely on massive combinations and parameter scans of existing open-source technologies. Different models exhibited markedly different behavioral flaws. Claude frequently violated system instructions to maintain autonomous operation, often shutting down prematurely and waiting for human intervention, idling for 22 hours during a 47-hour task. While Codex could operate around the clock, it was prone to falling into infinite loops, performing hours-long ineffective brute-force searches within the same hyperparameter space. When accessing external information, Codex rarely checked the latest updates on code hosting platforms, relying solely on local historical records. In contrast, Claude allocated a large token budget to reading human developers’ merge requests. The essence of these cutting-edge models remains efficient engineering validation and hyperparameter tuning machines, and their evolution always depends on humans providing initial clues for algorithm innovation. (Source: BlockBeats)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

9 Likes

Reward
9
2
2
Share

Comment

Add a comment

ReflectiveChainShadow

· 3h ago

The boundaries exposed during the two-week experiment are more valuable than the results; looking forward to what's next.

View OriginalReply0

AirdropSideQuest

· 3h ago

The conclusion is written very honestly: the model requires human-provided cues, and algorithmic innovation currently has no solution.

View OriginalReply0

Trending Topics
View More
#
TradfiTradingChallenge
277.67K Popularity
#
PlatinumCardCreatorExclusive
83.04K Popularity
#
DailyPolymarketHotspot
1.03M Popularity
#
GateSquarePizzaDay
1.75M Popularity
#
SpaceXOfficiallyFilesforIPO
556.07K Popularity

Pinned

Sitemap

Burned 14k hours of H200 computing power, Claude Opus breaks nanoGPT record

Trending Topics

TradfiTradingChallenge

PlatinumCardCreatorExclusive

DailyPolymarketHotspot

GateSquarePizzaDay

SpaceXOfficiallyFilesforIPO

Pinned