A new approach to open-source long context, the idea of coarse screening plus fine calculation is quite clever, and switching back to full attention during training to prevent degradation is also thoughtful.

View Original
MeNews
Nous开源Lighthouse Attention:单B200跑512K提速17倍
AIMPACT states that Nous Research has open-sourced the long-context pretraining mechanism Lighthouse Attention. Processing 512K tokens on a single B200 card is approximately 17 times faster, and at 98K tokens, end-to-end speed is increased by 1.4 to 1.7 times. This mechanism first performs coarse screening followed by fine calculation, filtering out core segments through multi-level summaries and concatenating them into short texts, which are then processed by FlashAttention; the filtering logic is outside the core, eliminating the need for low-level code and additional training objectives. To prevent the model from losing the ability to read word-by-word due to jumping reading, during training, most of the process is completed in accelerated mode, with a brief switch back to full attention at the end. In experiments with 530 million parameters and 50 billion tokens, the time consumption was significantly reduced, and the final performance was comparable to or even surpassing traditional baselines.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned