ME AI news, according to Beating monitoring: as AI training time increases, it gradually loses the ability to absorb new knowledge (loss of plasticity). Ultimately, the more it trains, the more rigid it becomes. If the loss of plasticity cannot be overcome, large models can never continuously learn at low cost. Each time knowledge is updated, it must be retrained by putting all historical data and new data together, consuming massive computing power.

AI startup Zyphra’s latest research is the first to prove that increasing model size may delay degradation, but marginal benefits diminish—simply stacking parameters cannot fundamentally cure the loss of plasticity. Extrapolation shows that a 1B-parameter model will become dumber after training on 1.8 trillion tokens, while a 7B model will show signs after 9 trillion. Even more disruptive: even without task switching—just training the model on a stable mixed dataset—loss of plasticity still occurs.

The study points out three direct reasons why large models become dumber: first, the parameter volume keeps growing during training, and under the LayerNorm mechanism it obstructs gradient propagation; second, large-scale neuron dormancy in the MLP layer (“work stoppage”)—in some models, even 95% of neurons go into dormancy; and third, attention head paralysis (collapsing while only focusing on certain characters) or “phoning it in” (evenly smearing across all contexts). For these pathological features, potential treatment approaches include limiting parameter expansion, periodically giving dormant neurons a “neural reset” to forcibly reactivate them, and introducing random noise into the attention mechanism to forcibly correct deviations.

(Source: BlockBeats)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
Get2SharesOfSKHynixAtZeroCost
1.56M Popularity
#
MicronOvertakesMetaInMarketValue
364.33K Popularity
#
WorldCup🇨🇴vs🇵🇹
343.21K Popularity
#
USMayPCEInflationRisesTo4.1%HighestIn3Years
178.32K Popularity
#
StakeUSD1Earn9.48%APR
983.87K Popularity

Pinned

Sitemap

Stop blindly piling on computing power! Research shows that large models become more "rigid" as they are trained, and increasing parameters won't help.

Trending Topics

Get2SharesOfSKHynixAtZeroCost

MicronOvertakesMetaInMarketValue

WorldCup🇨🇴vs🇵🇹

USMayPCEInflationRisesTo4.1%HighestIn3Years

StakeUSD1Earn9.48%APR

Pinned