General-purpose GPU running 1T MoE surpasses a thousand tokens; this collaborative design has some real potential.

View Original
CoinNetwork
CoinWorld News, Xiaomi Mimo team and AI compilation optimization system group Tilert announced the launch of Mimo-v2.5-pro-ultraspeed inference mode.
On a single standard 8-card general-purpose GPU node, they successfully achieved an extreme generation speed of over 1,000 tokens/sec on a 1 trillion parameter mixture of experts (MoE) model, with a peak of about 1,200 tokens/sec.
This marks the first time that, without using unconventional hardware such as wafer-level integration or dedicated on-chip SRAM chips, standard general-purpose hardware combined with model-system co-design has broken through the 1 trillion model, thousand-token generation speed.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned