Lighthouse Attention is a pretty clever idea—first rough screening, then precise calculation. Long context finally doesn't have to be hard-pressed anymore.

View Original
MeNews
Nous开源Lighthouse Attention:单B200跑512K提速17倍
AIMPACT states that Nous Research has open-sourced the long-context pretraining mechanism Lighthouse Attention. Processing 512K text on a single B200 card is about 17 times faster; at 98K, end-to-end speed is increased by 1.4–1.7 times. This mechanism first performs coarse screening and then fine calculation: it uses multi-level summaries to filter out core segments, stitches them into short text, and then hands them to FlashAttention for processing. The screening logic is outside the kernel, eliminating the need to modify low-level code and additional training objectives. To prevent the model from losing character-by-character reading ability due to jumping reads, during training it completes most of the process in an accelerated mode, then switches back to full attention for a brief period at the end. In experiments with 530 million parameters and 50 billion tokens, the time consumption drops significantly, and the final performance is comparable to, and even surpasses, traditional baselines.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments