Luofu Li: Large models enter the post-training era, with top teams' pre-training and post-training computing power ratio reaching 1:1

robot
Abstract generation in progress

ME News Report, April 24 (UTC+8), according to Dongcha Beating monitoring, Xiaomi’s large model team leader Luo Fuli pointed out that the competition for large models has shifted from the Chat era dominated by pretraining to the Post-train dominated Agent era. The current core focus is “how to scale reinforcement learning (RL) effectively on Agents.” This paradigm shift directly leads to a restructuring of computing resource allocation. Luo Fuli revealed that during the Chat era, the ratio of computing power used for research, pretraining, and post-training was approximately 3:5:1; whereas in the current Agent era, a reasonable allocation ratio has become 3:1:1, meaning the investment in pretraining and post-training has become roughly equal, with top model teams now investing equally in both. At the same time, the requirements for system architecture have also undergone a major change. Previously, RL infrastructure mainly centered around “model inference engines” for pure text computation; now, the infrastructure must be centered around “Agents,” supporting heterogeneous cluster scheduling and tolerating the ambiguity caused by various uncontrollable factors that can interrupt Agents during complex workflows. (Source: BlockBeats)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin