Luofuli: Large models enter the post-training era, with top teams' pre-training and post-training computing power ratio reaching 1:1

robot
Abstract generation in progress

CoinWorld News reports that Luofuli states that the competition for large models has shifted from the pre-training dominated chat era to the post-training dominated agent era. The current key focus is how to scale reinforcement learning effectively on agents. She revealed that during the chat era, the computational power ratio used for research, pre-training, and post-training was approximately 3:5:1, but in today’s agent era, a reasonable allocation ratio has become 3:1:1, meaning that the investment in pre-training and post-training has become roughly equal. Currently, top model teams have reached a 1:1 investment ratio in these two areas. At the same time, the requirements for system architecture have also undergone a major change. In the past, reinforcement learning infrastructure mainly centered around model inference engines, handling pure text calculations. Now, the infrastructure must be agent-centric, supporting heterogeneous cluster scheduling and tolerating the ambiguity caused by various uncontrollable factors that may interrupt agents in complex workflows.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin