Stanford NLP demonstrated at ICML 2026 that by automating executors, pre-training and post-training of LLMs can be transformed into execution environments, using execution feedback to improve research efficiency. Two methods: evolutionary search outperforms GRPO (69.4% vs. 48.0%) in post-training tasks, and recipes found in pre-training tasks are faster than nanoGPT (19.7 minutes vs. 35.9 minutes), both completed within ten search rounds; reinforcement learning based on execution rewards is prone to mode collapse, which, while increasing average rewards, does not improve the upper bound. This work points the way for execution-oriented automated AI research.

MeNews

2026-05-20 10:32:22

Abstract generation in progress

AIMPACT News, May 15 (UTC+8), Stanford NLP team showcased a new automated AI research project at ICML 2026. By building automated executors, they transformed LLM pretraining and post-training into execution environments and used execution feedback to enhance research effectiveness. The study analyzed two methods: evolutionary search, which is sample-efficient and found better solutions in post-training tasks compared to the GRPO baseline (69.4% vs. 48.0%), and formulas found in pretraining tasks that outperformed the nanoGPT baseline (19.7 minutes vs. 35.9 minutes), both completed within ten search cycles; whereas reinforcement learning based on execution rewards faced mode collapse issues, improving average rewards but not the upper limit. This work provides direction for execution-oriented automated AI research. (Source: InFoQ)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

11 Likes

Reward
11
7
8
Share

Comment

Add a comment

MintColdBrew

· 2h ago

The work for ICML 2026 is already this intense.

View OriginalReply0

QuietRugAlarm

· 3h ago

19 minutes vs 36 minutes, nanoGPT was completely beaten.

View OriginalReply0

Half-MeltedIceCreamPosition

· 3h ago

Evolutionary search beats GROP, this efficiency improvement is a bit outrageous

View OriginalReply0

AirdropOnTheDune

· 3h ago

An integrated environment for pre-training and fine-tuning—are you aiming for AI self-iteration?

View OriginalReply0

NodeUnderTheAurora

· 3h ago

The mode collapse problem is very real, and reward hacking is a well-known issue.

View OriginalReply0

SeaSaltMarketMakingNotes

· 3h ago

Ten rounds of search to converge, sample efficiency is higher than I imagined.

View OriginalReply0

YieldNotYell

· 3h ago

Closing the feedback loop is the soul of automation

View OriginalReply0

Trending Topics
View More
#
TradfiTradingChallenge
192.21K Popularity
#
30YearTreasuryYieldBreaks5%
363.4K Popularity
#
DailyPolymarketHotspot
1M Popularity
#
RWAMarketCapExceeds65Billion
8.75M Popularity
#
GateSquarePizzaDay
1.68M Popularity

Pinned

Sitemap

Stanford NLP Team Demonstrates New Advances in Automated AI Research

Trending Topics

TradfiTradingChallenge

30YearTreasuryYieldBreaks5%

DailyPolymarketHotspot

RWAMarketCapExceeds65Billion

GateSquarePizzaDay

Pinned