Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Pre-IPOs
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Yifan Zhang discloses DeepSeek V4 complete technical specifications: 1.6T parameters, 384 expert activations for 6
According to Beating Monitoring, Princeton PhD student Yifan Zhang updated the technical details of DeepSeek V4 on X. He previewed “V4 next week” on April 19 and listed the names of three architecture components. Tonight, he provided the complete parameter table and, for the first time, disclosed that there is a lightweight version, V4-Lite, with 285B parameters.
V4 total (parameter) count is 1.6T. The attention mechanism is DSA2, combining two sparse attention schemes: the DSA (DeepSeek Sparse Attention) previously used by DeepSeek in V3.2 and the NSA (Native Sparse Attention) proposed in a paper earlier this year. The head-dim is 512, together with Sparse MQA and SWA (Sliding Window Attention). The MoE layer has 384 experts; 6 are activated each time, using the Fused MoE Mega-Kernel. The residual connections follow Hyper-Connections.
Details first disclosed for the training side include: the optimizer is Muon (a matrix-level optimizer that applies Newton-Schulz orthogonalization to momentum updates), the pre-training context length is 32K, and in the reinforcement learning stage it uses GRPO and includes KL-divergence correction. The final context length is extended to 1M. The modality is pure text.
Zhang does not work at DeepSeek, and DeepSeek officials have not responded to the above information.