"Huawei chip slows down DeepSeek V4 launch"? The same core handles both Nvidia and Ascend and accelerates nearly 2x.

robot
Abstract generation in progress
ME News report: On April 24 (UTC+8), according to Dongcha Beating monitoring, before the release of DeepSeek V4, a widespread rumor circulated in the community that V4’s launch time was later than expected because the model encountered adaptation difficulties when migrating from NVIDIA to the Huawei Ascend platform. Although the V4 technical report did not directly respond to the rumor, the performance data it disclosed clearly contradicts it. The report shows that V4’s Fine-Grained EP Scheme has completed deployment and verification on both NVIDIA GPU and Huawei Ascend NPU platforms. For regular inference workloads, it accelerates performance by 1.50 to 1.73 times, and for latency-sensitive scenarios such as RL rollout and high-speed Agent services, the maximum acceleration reaches 1.96 times. The team has open-sourced the CUDA version kernel of MegaMoE as part of DeepGEMM. In other words, V4 achieved efficiency close to the theoretical limit on both sets of hardware, and cross-platform adaptation did not result in any performance loss. (Source: BlockBeats)
DEEPSEEK-2.58%
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned