MIT Researchers Reveal LLM Strong Superposition Mechanism: Doubling Width Halves Error Rate

robot
Abstract generation in progress

AIMPACT News, May 3 (UTC+8), MIT researchers reveal the mechanism by which large language model performance reliably scales with size, providing experimental validation for the “superposition” phenomenon for the first time. The study finds that LLMs bypass dimensionality limits by storing multiple concepts within the same dimension, and this “strong superposition” allows the model to represent all concepts simultaneously, with errors arising from noise generated by overlaps. The team validated their findings using the Anthropic simplified model and open-source models such as OPT, GPT-2, Qwen2.5, and Pythia: doubling the model width reduces errors by about half, with a scaling exponent of 0.91, close to the theoretical value of 1. The research answers two key questions: scaling will stop when model width matches vocabulary size; for natural language tasks, the flatness of word frequency distribution limits the acceleration of the search space, but architecture designs that encourage superposition can achieve better performance at the same scale.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin