Moonshot AI extends the Prefill/Decode decoupling technology to cross-data center and heterogeneous hardware
ME News message: On April 18 (UTC+8), the Moonshot AI team recently announced that its decoupling technology for Prefill and Decode has successfully expanded from a single cluster to cross–data center and heterogeneous hardware environments. According to the article, this move is expected to significantly reduce the inference cost per token. Previously, the expansion of this technology was hindered by KV cache transmission overhead. The key to this breakthrough relied on its hybrid model Kimi.