According to Beating monitoring, Y Combinator partner Diana Hu pointed out on X that, compared with simply expanding parameter scale, the next frontier lies in building a thin software layer on top of the foundation model—so that AI can write the rules for solving problems the way a programmer does (executable world models). AI can continuously test, modify, and streamline code based on the results of runs, without needing to perform costly fine-tuning of the large model itself.

The path of gradient-free code learning is corroborated by the heuristic learning (Heuristic Learning) paradigm proposed last month by OpenAI post-training core member Wang Jiayi. Traditional reinforcement learning needs tens of thousands of debugging iterations for AI to learn a task—forcing experience into the neural network’s black box—which is highly energy-intensive and prone to forgetting. Wang Jiayi’s experiments showed that, without adjusting any parameters of the large model, relying purely on the large model itself to write Python code, find bugs, and devise debugging rules, it can clear Atari Breakout. This indicates that the carrier of knowledge can be a human-readable, testable code system, rather than neural network weights that people can’t make sense of.

In Paul Graham’s view, a co-founder of YC, the loop of writing code, verifying it, and compressing it is very close to a scientist’s everyday research routine. Large models do not need to restructure the “brain”; instead, like scientists, they write hypothesis models for new environments using code, run code to conduct validation experiments, and distill the simplest rules to solve problems. The process of finding the simplest program is also the ultimate standard for measuring the efficiency of artificial intelligence in ARC-AGI.

The most critical dividend is that gradient-free learning can directly ride on improvements in the underlying large model’s capabilities. As the underlying large model becomes smarter, the code and strategies written by agents will grow exponentially stronger. Building on Richard Sutton’s famous The Bitter Lesson, gradient-free code learning is drawing a brand-new S-curve. With the explosion of large models’ code capabilities, the path of AI self-evolution is opening the curtain on the next generation of artificial intelligence paradigms.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
ShareYourUSStocksWinNvidia
3.71M Popularity
#
BitcoinRalliesOver5Percent
499.12M Popularity
#
WinGoldBarsWithGrowthPoints
1.33M Popularity
#
StrongNonfarmPayrollsRekindleRateHikeFear
1.81M Popularity
#
PredictNBAFinalsWin20000U
853K Popularity

Pinned

Sitemap

YC Partner: Instead of competing over model size, let AI evolve itself by writing code like scientists do

Trending Topics

ShareYourUSStocksWinNvidia

BitcoinRalliesOver5Percent

WinGoldBarsWithGrowthPoints

StrongNonfarmPayrollsRekindleRateHikeFear

PredictNBAFinalsWin20000U

Pinned