MiniMax open-source Blackwell-exclusive attention library, M3 weights expected to be released this Friday

robot
Abstract generation in progress

ME AI News, according to Beating monitoring, MiniMax developer relations head Ryan Lee announced that MiniMax Sparse Attention (MSA), a high-performance attention library for NVIDIA Blackwell (SM100) GPUs, has officially been open-sourced under the MIT license. Ryan Lee also said that the MiniMax-M3 weights are expected to be released this Friday.

MSA has been applied to million-scale context inference for MiniMax-M3 by filtering the most relevant KV blocks within each GQA group and performing attention computation only on the selected blocks. The paper shows that, for a context of 1 million tokens, compared with the same-configuration Dense GQA, MSA can reduce attention computation by 28.4 times, and achieve 14.2 times prefill acceleration and 7.6 times decoding acceleration on H800 GPUs.

The open-source version integrates two sets of implementations—C++ JIT and CuTe-DSL—within the same Python package, and also provides Dense FlashAttention and Sparse Top-k Attention kernels, supporting multiple precision formats such as BF16, FP8, NVFP4, and FP4. Currently, it is mainly deployed on NVIDIA Blackwell (SM100) GPUs.

(Source: BlockBeats)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned