Luofu Li: Large models enter the post-training era, with top teams' pre-training and post-training computing power ratio reaching 1:1

robot
Abstract generation in progress

According to Beating Monitoring, Luo Fuli, head of Xiaomi’s large model team, pointed out that the competition for large models has shifted from the Chat era dominated by pre-training to the Post-train dominated Agent era. The current key focus is “how to effectively scale reinforcement learning (RL) on Agents.”

This paradigm shift directly leads to a reconstruction of computing resource allocation. Luo Fuli revealed that during the Chat era, the ratio of computing power used for research, pre-training, and post-training was approximately 3:5:1; whereas in the current Agent era, a reasonable allocation ratio has become 3:1:1, meaning the investment in pre-training and post-training has become roughly equal, with top-tier model teams now investing equally in both.

At the same time, the requirements for system architecture have also undergone a major change. Previously, RL infrastructure mainly centered around the “model inference engine” to handle pure text computations; now, the infrastructure must be centered around the “Agent,” supporting heterogeneous cluster scheduling and tolerating the ambiguity caused by various uncontrollable factors that may interrupt Agents during complex workflows.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin