Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Pre-IPOs
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Yifan Zhang discloses DeepSeek V4 complete technical specifications: 1.6T parameters, 384 expert activations for 6
ME News report: On April 22 (UTC+8), according to Beating monitoring, Princeton PhD student Yifan Zhang posted technical details of DeepSeek V4 on X. He teased “V4 next week” on April 19 and listed three architectural component names. Tonight, he provided the complete parameter table and, for the first time, disclosed the existence of a lightweight version, V4-Lite, with 285B parameters. V4 has a total parameter count of 1.6T. The attention mechanism is DSA2, combining two sparse attention approaches previously used by DeepSeek in V3.2—DSA (DeepSeek Sparse Attention) and NSA (Native Sparse Attention), proposed in a paper earlier this year—along with head-dim 512, paired with Sparse MQA and SWA (Sliding Window Attention). The MoE layer has 384 experts in total; it activates 6 each time and uses the Fused MoE Mega-Kernel. The residual connections continue to use Hyper-Connections.
Details disclosed for the first time at the training stage include: the optimizer Muon (a matrix-level optimizer that applies Newton-Schulz orthogonalization to momentum updates), a pre-training context length of 32K, and the use of GRPO during the reinforcement learning phase with added KL divergence correction. The final context length is extended to 1M. The modality is pure text. Zhang is not employed by DeepSeek, and DeepSeek officials have not responded to the above information. (Source: BlockBeats)