Open source + single B200 card running 512K, the engineering bottleneck for long-context models has been tackled again. Looking forward to seeing what the community can come up with next.

View Original
MeNews
Nous开源Lighthouse Attention:单B200跑512K提速17倍
AIMPACT states that Nous Research has open-sourced the long-context pretraining mechanism Lighthouse Attention. Processing 512K text on a single B200 card is about 17 times faster, and at 98K, end-to-end speed is increased by 1.4–1.7 times. The mechanism first performs rough screening, then precise calculation: it filters out core segments using multi-level summaries and stitches them into short texts, which are then handled by FlashAttention; the filtering logic sits outside the kernel, avoiding changes to low-level code and additional training objectives. To prevent the model from losing character-by-character reading ability due to jumping reading, during training it completes most of the process in accelerated mode, and briefly switches back to full attention at the end. In experiments with 5.3 hundred million parameters and 500 hundred million tokens, the time consumed drops significantly, and the final performance is comparable to, and even surpasses, the traditional baselines.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned