According to monitoring by Dongcha Beating, Princeton PhD student Yifan Zhang updated the technical details of DeepSeek V4 on X. He previewed ‘V4 next week’ on April 19 and listed three architecture component names, providing a complete parameter table tonight, while also disclosing for the first time the existence of a lightweight version, V4-Lite, with 285B parameters. The total parameters for V4 are 1.6T. The attention mechanism is DSA2, which combines two sparse attention schemes: DSA (DeepSeek Sparse Attention) used in V3.2 and NSA (Native Sparse Attention) proposed in a paper earlier this year. The head dimension is 512, paired with Sparse MQA and SWA (Sliding Window Attention). The MoE layer has a total of 384 experts, with 6 activated at a time, using Fused MoE Mega-Kernel. Residual connections follow Hyper-Connections. Details disclosed for the training phase include: the optimizer used is Muon (a matrix-level optimizer applying Newton-Schulz orthogonalization to momentum updates), with a pre-training context length of 32K, and reinforcement learning phase using GRPO with KL divergence correction added. The final context length is extended to 1M. The modality is pure text. Zhang does not hold a position at DeepSeek, and DeepSeek has not responded to the above information.

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
Gate13thAnniversaryLive
1.16M Popularity
#
WCTCTradingChallengeShare8MUSDT
781.76K Popularity
#
BitcoinBouncesBack
201.63K Popularity
#
USIranTalksProgress
786.65K Popularity
#
ArbitrumFreezesKelpDAOHackerETH
43.11K Popularity

Sitemap

Yifan Zhang Reveals Complete Technical Specifications of DeepSeek V4: 1.6T Parameters, 384 Experts with 6 Activated

Trending Topics

Gate13thAnniversaryLive

WCTCTradingChallengeShare8MUSDT

BitcoinBouncesBack

USIranTalksProgress

ArbitrumFreezesKelpDAOHackerETH

Pin