The strongest open-source model DeepSeek V4 is finally here! A 1.6 trillion parameter model, MIT license, long-text VRAM compressed to one-tenth of V3.2.

ME News message: On April 24 (UTC+8), according to Beating monitoring, DeepSeek open-sourced a V4 series preview under the MIT license, with the weights already available on Hugging Face and ModelScope. The series includes two MoE models: V4-Pro has total parameters of 1.6T and 49B (49 billion) activated per token; V4-Flash has total parameters of 284B (284 billion) and 13B (130 billion) activated. Both support up to a 1M token context.

The architecture features three upgrades: a hybrid attention mechanism (Compressed Sparse Attention CSA + Heavy Compressed Attention HCA) that significantly reduces long-context overhead—when running under a 1M context, V4-Pro’s single-token inference FLOPs are only 27% of V3.2’s, and the KV cache (GPU memory usage for storing historical information during inference) is only 10% of V3.2’s; manifold constrained hyperconnection mHC replaces traditional residual connections, enhancing the stability of cross-layer signal propagation; and training switches to the Muon optimizer to accelerate convergence. Pre-training data exceeds 32T tokens.

Post-training is divided into two stages: first, use SFT and GRPO reinforcement learning to train experts in each domain separately, and then use online distillation to unify and merge them into a single model. V4-Pro-Max (the highest reasoning intensity mode) claims to be the strongest open-source model currently available, with coding benchmarks reaching the top tier; the gap with closed-source cutting-edge systems in inference and agent tasks has narrowed significantly. V4-Flash-Max, after being given a sufficient thinking budget, has reasoning performance close to Pro, but its pure knowledge and complex agent tasks are limited by its parameter scale. The weights are stored using FP4+FP8 mixed precision.

(Source: BlockBeats)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned