CoinWorld News, Xiaomi Mimo team and AI compilation optimization system group Tilert announced the launch of Mimo-v2.5-pro-ultraspeed inference mode.
On a single standard 8-card general-purpose GPU node, they successfully achieved an extreme generation speed of over 1,000 tokens/sec on a 1 trillion parameter mixture of experts (MoE) model, with a peak of about 1,200 tokens/sec.
This marks the first time that, without using unconventional hardware such as wafer-level integration or dedicated on-chip SRAM chips, standard general-purpose hardware combined with model-system co-design has broken through the 1 trillion model, thousand-token generation speed.
On a single standard 8-card general-purpose GPU node, they successfully achieved an extreme generation speed of over 1,000 tokens/sec on a 1 trillion parameter mixture of experts (MoE) model, with a peak of about 1,200 tokens/sec.
This marks the first time that, without using unconventional hardware such as wafer-level integration or dedicated on-chip SRAM chips, standard general-purpose hardware combined with model-system co-design has broken through the 1 trillion model, thousand-token generation speed.