Cambricon completes DeepSeek-V4 adaptation, code has been open-sourced, boosting the strength of domestic chip stocks

robot
Abstract generation in progress

According to Beating Monitoring, Cambricon announced that on the day of V4 release, they completed adaptation for two models: 285B DeepSeek-V4-Flash and 1.6T DeepSeek-V4-Pro, based on the vLLM inference framework, with adaptation code open-sourced on GitHub.

The speed of adaptation depends on two prerequisites: first, Cambricon’s self-developed NeuWARE software stack natively supports mainstream frameworks such as PyTorch and vLLM, enabling rapid model migration; second, Cambricon chips natively support mainstream low-precision data formats, allowing for accuracy verification without additional format conversion. For the new architecture of V4, Cambricon has accelerated modules like Compressor and mHC through their self-developed fused operator library Torch-MLU-Ops, and has written kernels for hotspot operators such as sparse/compressed Attention and GroupGemm using BangC.

At the inference framework level, Cambricon supports five-dimensional hybrid parallelism (TP/PP/SP/DP/EP), communication computation parallelism, low-precision quantization, and PD separation deployment within vLLM. The V4 technical report only mentions validation on NVIDIA GPUs and Huawei Ascend NPUs, without involving Cambricon platforms. This adaptation was independently completed by Cambricon. Stimulated by the V4 release news, the A-share domestic chip sector strengthened, with Cambricon’s stock price soaring intraday.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin