According to Beating Monitoring, Princeton PhD student Yifan Zhang updated the technical details of DeepSeek V4 on X. He previewed “V4 next week” on April 19 and listed the names of three architecture components. Tonight, he provided the complete parameter table and, for the first time, disclosed that there is a lightweight version, V4-Lite, with 285B parameters.

V4 total (parameter) count is 1.6T. The attention mechanism is DSA2, combining two sparse attention schemes: the DSA (DeepSeek Sparse Attention) previously used by DeepSeek in V3.2 and the NSA (Native Sparse Attention) proposed in a paper earlier this year. The head-dim is 512, together with Sparse MQA and SWA (Sliding Window Attention). The MoE layer has 384 experts; 6 are activated each time, using the Fused MoE Mega-Kernel. The residual connections follow Hyper-Connections.

Details first disclosed for the training side include: the optimizer is Muon (a matrix-level optimizer that applies Newton-Schulz orthogonalization to momentum updates), the pre-training context length is 32K, and in the reinforcement learning stage it uses GRPO and includes KL-divergence correction. The final context length is extended to 1M. The modality is pure text.

Zhang does not work at DeepSeek, and DeepSeek officials have not responded to the above information.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
Gate13thAnniversaryLive
1.17M Popularity
#
WCTCTradingChallengeShare8MUSDT
786.04K Popularity
#
BitcoinBouncesBack
204.27K Popularity
#
EthereumMemeSeasonReturns
1.99M Popularity
#
USIranTalksProgress
640.06K Popularity

Sitemap

Yifan Zhang discloses DeepSeek V4 complete technical specifications: 1.6T parameters, 384 expert activations for 6

Trending Topics

Gate13thAnniversaryLive

WCTCTradingChallengeShare8MUSDT

BitcoinBouncesBack

EthereumMemeSeasonReturns

USIranTalksProgress

Pin