Tencent HunYuan proposes the Stem sparse attention algorithm, reducing initial delay by 3.6 times

Mars Finance News: On June 5, Tencent Hun Yuan announced the submission of the Stem sparse attention algorithm, which has been accepted by the top machine learning conference ICML-26. According to the full-stack acceleration solution of the Stem algorithm × HPC operators, at the algorithm level, Stem achieves near-lossless accuracy under a 25% budget through Token Position Decay (TPD) and Output Awareness Metric (OAM); at the operator level, the open-source HPC Stem+BSA operator converts sparse benefits into real hardware acceleration, reducing first-byte latency by 3.7 times under a 128K context. (Wide-angle observation)
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned