Stanford NLP Team Demonstrates New Advances in Automated AI Research

robot
Abstract generation in progress
AIMPACT News, May 15 (UTC+8), Stanford NLP team showcased a new automated AI research project at ICML 2026. By building automated executors, they transformed LLM pretraining and post-training into execution environments and used execution feedback to enhance research effectiveness. The study analyzed two methods: evolutionary search, which is sample-efficient and found better solutions in post-training tasks compared to the GRPO baseline (69.4% vs. 48.0%), and formulas found in pretraining tasks that outperformed the nanoGPT baseline (19.7 minutes vs. 35.9 minutes), both completed within ten search cycles; whereas reinforcement learning based on execution rewards faced mode collapse issues, improving average rewards but not the upper limit. This work provides direction for execution-oriented automated AI research. (Source: InFoQ)
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 7
  • 8
  • Share
Comment
Add a comment
Add a comment
MintColdBrew
· 2h ago
The work for ICML 2026 is already this intense.
View OriginalReply0
QuietRugAlarm
· 3h ago
19 minutes vs 36 minutes, nanoGPT was completely beaten.
View OriginalReply0
Half-MeltedIceCreamPosition
· 3h ago
Evolutionary search beats GROP, this efficiency improvement is a bit outrageous
View OriginalReply0
AirdropOnTheDune
· 3h ago
An integrated environment for pre-training and fine-tuning—are you aiming for AI self-iteration?
View OriginalReply0
NodeUnderTheAurora
· 3h ago
The mode collapse problem is very real, and reward hacking is a well-known issue.
View OriginalReply0
SeaSaltMarketMakingNotes
· 3h ago
Ten rounds of search to converge, sample efficiency is higher than I imagined.
View OriginalReply0
YieldNotYell
· 3h ago
Closing the feedback loop is the soul of automation
View OriginalReply0
  • Pinned