CryptoWorld News reports that DeepSeek has released the V4 series. The flagship model has total parameters of 1.6T, supports a 1M context window, and its inference compute power is only 27% of V3.2.



This series includes two MOE models: v4-pro has total parameters of 1.6T, activating 49B (49 billion) per token; v4-flash has total parameters of 284B (284 billion), activating 13B (13 billion) per token.

Architecture upgrades include a hybrid attention mechanism, which significantly reduces the overhead of long-context processing; the single-token inference FLOPS of v4-pro is only 27% of V3.2.

Pretraining data exceeds 32T tokens, and post-training is conducted in two phases. v4-pro-max claims to be the strongest open-source model currently, with inference performance approaching cutting-edge proprietary models.

The weights are stored using mixed precision of FP4+FP8.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin