DeepSeek-V4-Flash launched on Huawei Cloud

On April 24th, the DeepSeek-V4 model was officially released and open-sourced, with Huawei Cloud being the first to adapt it.
For DeepSeek-V4, Huawei Cloud’s first-adapted model layer attention compression mechanism was implemented, achieving efficient allocation and management of KVCache under the V4 attention mechanism, providing over 10 high-performance fusion operators such as TopK, SWA, and CFA.
Coupled with framework asynchronous scheduling, MTP multi-step speculation, and other framework optimizations, it supports high-performance inference with native 1M long context.
Currently, Huawei Cloud’s MaaS (Model as a Service) platform offers developers a token service that allows one-click invocation of DeepSeek-V4-Flash API without deployment.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin