On X, Princeton PhD student Zhang disclosed the key points of DeepSeek V4: V4/V4-Lite each have 285B and 1.6T parameters, using DSA2 sparse attention (DSA+NSA), head-dim 512, Sparse MQA and SWA, MoE with 384 experts, activating 6 experts at a time, along with Fused Mega-Kernel and Hyper-Connections. Training uses the Muon optimizer, with a 32K context length; during the RL stage, it uses GRPO+KL calibration, and the final context length is extended to 1M. The modality is pure text; Zhang is not a DeepSeek employee, and the official has not responded.

MeNews

2026-04-22 16:20:18

Abstract generation in progress

ME News report: On April 22 (UTC+8), according to Beating monitoring, Princeton PhD student Yifan Zhang posted technical details of DeepSeek V4 on X. He teased “V4 next week” on April 19 and listed three architectural component names. Tonight, he provided the complete parameter table and, for the first time, disclosed the existence of a lightweight version, V4-Lite, with 285B parameters. V4 has a total parameter count of 1.6T. The attention mechanism is DSA2, combining two sparse attention approaches previously used by DeepSeek in V3.2—DSA (DeepSeek Sparse Attention) and NSA (Native Sparse Attention), proposed in a paper earlier this year—along with head-dim 512, paired with Sparse MQA and SWA (Sliding Window Attention). The MoE layer has 384 experts in total; it activates 6 each time and uses the Fused MoE Mega-Kernel. The residual connections continue to use Hyper-Connections.

Details disclosed for the first time at the training stage include: the optimizer Muon (a matrix-level optimizer that applies Newton-Schulz orthogonalization to momentum updates), a pre-training context length of 32K, and the use of GRPO during the reinforcement learning phase with added KL divergence correction. The final context length is extended to 1M. The modality is pure text. Zhang is not employed by DeepSeek, and DeepSeek officials have not responded to the above information. (Source: BlockBeats)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
Gate13thAnniversaryLive
1.21M Popularity
#
WCTCTradingChallengeShare8MUSDT
795.12K Popularity
#
BitcoinBouncesBack
210.89K Popularity
#
EthereumMemeSeasonReturns
2M Popularity
#
USIranTalksProgress
751.15K Popularity

Sitemap

Yifan Zhang discloses DeepSeek V4 complete technical specifications: 1.6T parameters, 384 expert activations for 6

Trending Topics

Gate13thAnniversaryLive

WCTCTradingChallengeShare8MUSDT

BitcoinBouncesBack

EthereumMemeSeasonReturns

USIranTalksProgress

Pin