DeepSeek V4 Released: 1.6T Parameter Flagship Supports 1M Context, Inference Computing Power Only 27% of V3.2

ME News message. On April 24 (UTC+8), according to Dongcha Beating monitoring, DeepSeek has open-sourced a V4 series preview version under the MIT license, and the weights have been uploaded to Hugging Face and ModelScope.

The series includes two MoE models: V4-Pro, with total parameters of 1.6T and 49B (49 billion) activated per token; and V4-Flash, with total parameters of 284B (284 billion), activating 13B (130 billion).

Both support 1M token context windows.

The architecture features three upgrades: a hybrid attention mechanism (Compressed Sparse Attention CSA + Heavy Compression Attention HCA) that significantly reduces long-context overhead. Under a 1M-context window, V4-Pro’s single-token inference FLOPs are only 27% of V3.2, and the KV cache (GPU memory usage for storing historical information during inference) is only 10% of V3.2. Manifold-constrained hyperconnection mHC replaces traditional residual connections, improving the stability of signal propagation across layers. Training has switched to the Muon optimizer to accelerate convergence.

Pretraining data exceeds 32T tokens.

Post-training is carried out in two stages: first, use SFT and GRPO reinforcement learning to train experts in each domain separately; then use online distillation to unify and merge them into a single model.

V4-Pro-Max (the maximum inference effort mode) claims to be the strongest current open-source model, achieving top-tier performance on coding benchmarks, with a significantly narrowed gap versus closed-source leaders in reasoning and agent tasks.

V4-Flash-Max, after being given a sufficient thinking budget, delivers reasoning performance close to Pro, but is limited by its parameter scale in pure knowledge and complex agent tasks.

The weights are stored using FP4+FP8 mixed precision.

(Source: BlockBeats)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned