YC Partner: Instead of competing over model size, let AI evolve itself by writing code like scientists do

robot
Abstract generation in progress

According to Beating monitoring, Y Combinator partner Diana Hu pointed out on X that, compared with simply expanding parameter scale, the next frontier lies in building a thin software layer on top of the foundation model—so that AI can write the rules for solving problems the way a programmer does (executable world models). AI can continuously test, modify, and streamline code based on the results of runs, without needing to perform costly fine-tuning of the large model itself.

The path of gradient-free code learning is corroborated by the heuristic learning (Heuristic Learning) paradigm proposed last month by OpenAI post-training core member Wang Jiayi. Traditional reinforcement learning needs tens of thousands of debugging iterations for AI to learn a task—forcing experience into the neural network’s black box—which is highly energy-intensive and prone to forgetting. Wang Jiayi’s experiments showed that, without adjusting any parameters of the large model, relying purely on the large model itself to write Python code, find bugs, and devise debugging rules, it can clear Atari Breakout. This indicates that the carrier of knowledge can be a human-readable, testable code system, rather than neural network weights that people can’t make sense of.

In Paul Graham’s view, a co-founder of YC, the loop of writing code, verifying it, and compressing it is very close to a scientist’s everyday research routine. Large models do not need to restructure the “brain”; instead, like scientists, they write hypothesis models for new environments using code, run code to conduct validation experiments, and distill the simplest rules to solve problems. The process of finding the simplest program is also the ultimate standard for measuring the efficiency of artificial intelligence in ARC-AGI.

The most critical dividend is that gradient-free learning can directly ride on improvements in the underlying large model’s capabilities. As the underlying large model becomes smarter, the code and strategies written by agents will grow exponentially stronger. Building on Richard Sutton’s famous The Bitter Lesson, gradient-free code learning is drawing a brand-new S-curve. With the explosion of large models’ code capabilities, the path of AI self-evolution is opening the curtain on the next generation of artificial intelligence paradigms.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments