Stop blindly piling on computing power! Research shows that large models become more "rigid" as they are trained, and increasing parameters won't help.

robot
Abstract generation in progress

ME AI news, according to Beating monitoring: as AI training time increases, it gradually loses the ability to absorb new knowledge (loss of plasticity). Ultimately, the more it trains, the more rigid it becomes. If the loss of plasticity cannot be overcome, large models can never continuously learn at low cost. Each time knowledge is updated, it must be retrained by putting all historical data and new data together, consuming massive computing power.

AI startup Zyphra’s latest research is the first to prove that increasing model size may delay degradation, but marginal benefits diminish—simply stacking parameters cannot fundamentally cure the loss of plasticity. Extrapolation shows that a 1B-parameter model will become dumber after training on 1.8 trillion tokens, while a 7B model will show signs after 9 trillion. Even more disruptive: even without task switching—just training the model on a stable mixed dataset—loss of plasticity still occurs.

The study points out three direct reasons why large models become dumber: first, the parameter volume keeps growing during training, and under the LayerNorm mechanism it obstructs gradient propagation; second, large-scale neuron dormancy in the MLP layer (“work stoppage”)—in some models, even 95% of neurons go into dormancy; and third, attention head paralysis (collapsing while only focusing on certain characters) or “phoning it in” (evenly smearing across all contexts). For these pathological features, potential treatment approaches include limiting parameter expansion, periodically giving dormant neurons a “neural reset” to forcibly reactivate them, and introducing random noise into the attention mechanism to forcibly correct deviations.

(Source: BlockBeats)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned