AIMPACT News, May 17 (UTC+8), Nous Research launched the Lighthouse Attention method, which addresses the quadratic growth of attention computation costs in long sequence pretraining by using a selective hierarchical attention mechanism. This method performs symmetric pooling on Query, Key, and Value, with the selection logic outside the attention kernel, allowing reuse of the FlashAttention kernel, and employs a two-stage training strategy. In tests on NVIDIA B200, with a context length of around 512K, forward propagation was accelerated by 21 times, and combined forward and backward acceleration reached 17.3 times, with the first stage achieving a throughput of 126k tokens/sec/GPU (vs

MeNews

2026-05-17 01:20:48

AIMPACT News, May 17 (UTC+8), Nous Research introduces the Lighthouse Attention method, which addresses the quadratic growth of attention computation costs in long sequence pretraining by using a selective hierarchical attention mechanism. This method performs symmetric pooling on Query, Key, and Value, with the selection logic outside the attention kernel, enabling reuse of the FlashAttention kernel, and employs a two-stage training strategy. Empirical tests on NVIDIA B200 show a 21x speedup in forward propagation at around 512K context length, a 17.3x combined speedup for forward + backward, with the first stage achieving a throughput of 126k tokens/sec/GPU (compared to 46k for dense SDPA). End-to-end acceleration ranges from 1.40× to 1.69×, while maintaining matching or lower training loss. Validation on a 530M parameter Llama-3 style model shows that the three Lighthouse runs achieve final losses (0.698-0.71) better than the from-scratch trained dense SDPA baseline (0.7237), saving 22.5-27 hours of training time. Paper: arXiv:2605.06554.

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
GateSquareMayTradingShare
1.94M Popularity
#
CLARITYActPassesSenateCommittee
3.59M Popularity
#
DailyPolymarketHotspot
974.92K Popularity
#
BitcoinVShapedReversalBack
227.15M Popularity
#
WCTCTradingKingPK
814.25K Popularity

Pinned

Sitemap

Nous Research发布Lighthouse Attention，长序列预训练提速1.4-1.7倍

Trending Topics

GateSquareMayTradingShare

CLARITYActPassesSenateCommittee

DailyPolymarketHotspot

BitcoinVShapedReversalBack

WCTCTradingKingPK

Pinned