Cambricon Completes DeepSeek-V4 Adaptation, Code Open-Sourced, Boosting Domestic Chip Stocks

According to monitoring by Dongcha Beating, Cambricon announced that it completed the adaptation of the 285B DeepSeek-V4-Flash and 1.6T DeepSeek-V4-Pro models on the same day as the V4 release, with the adaptation code open-sourced on GitHub. The adaptation speed relies on two prerequisites: first, Cambricon’s self-developed NeuWare software stack natively supports mainstream frameworks such as PyTorch and vLLM, allowing for quick model migration; second, Cambricon chips natively support mainstream low-precision data formats, enabling accuracy verification without additional format conversion. For the new architecture of V4, Cambricon has accelerated specific modules such as Compressor and mHC using its self-developed fusion operator library Torch-MLU-Ops, and has written kernels for hot operators like sparse/compressed Attention and GroupGemm using BangC. At the inference framework level, Cambricon supports five-dimensional mixed parallelism (TP/PP/SP/DP/EP), communication-computation parallelism, low-precision quantization, and PD separation deployment in vLLM. The V4 technical report only mentioned validations on NVIDIA GPUs and Huawei Ascend NPUs, without involving the Cambricon platform; this adaptation was completed independently by Cambricon. Following the news of the V4 release, domestic chip stocks in the A-share market strengthened, with Cambricon experiencing a sharp increase during trading.

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin