"Huawei chips delay the launch of DeepSeek V4"? The same core dominates both NVIDIA and Ascend, with nearly double the acceleration.

robot
Abstract generation in progress

According to Beating Monitoring, before the release of DeepSeek V4, there was a widespread rumor in the community: the V4 launch was later than expected because the model encountered adaptation difficulties when migrating from NVIDIA to Huawei Ascend platform.
Although the V4 technical report did not directly address this rumor, the disclosed performance data clearly contradicts it.

The report shows that the fine-grained expert partition scheme (Fine-Grained EP Scheme) for V4 has been deployed and verified on both NVIDIA GPUs and Huawei Ascend NPUs, with regular inference load acceleration of 1.50 to 1.73 times, and the highest acceleration of 1.96 times in latency-sensitive scenarios such as RL rollout and high-speed Agent services.
The team has also open-sourced the MegaMoE kernel within CUDA as part of DeepGEMM. In other words, V4 achieved near-theoretical maximum efficiency on both hardware platforms, and cross-platform adaptation did not cause performance loss.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin