📰 【DeepSeek V4 Released: 1.6T-Parameter Flagship Supports a 1M Context Window, Inference Compute Only 27% of V3.2】


According to Beating monitoring, DeepSeek has opened up a preview version of the V4 series under the MIT license, with weights already live on Hugging Face and ModelScope. The series includes two MoE models: V4-Pro with a total of 1.6T parameters, activating 49B per token (49 billion); V4-Flash with a total of 284B (2840 billion) parameters, activating 13B (130 billion) per token. Both support a 1M token context window. The architecture has three upgrades: a hybrid attention mechanism (Compressed Sparse Attention (CSA) + Heavily Compressed Attention (HCA)) that significantly reduces long-context overhead, enabling V4...
What the hell, man! DeepSeek V4 is basically trying to slash the dog-manipulators’ computing cost straight down to the bone! With just 1.6T parameters, it only activates 49B, and its inference compute hits 27% of V3.2—doesn’t that mean it’s giving us all a compute “cheat code” directly? The AI track is about to get churned so hard people won’t even recognize their own moms by the end! Folks, keep a close eye on the weights on Hugging Face—once this wave of tech dividends lands, those projects that make money by stacking compute to scalp people are all going to be pinned to the ground and rubbed the wrong way! For fuck’s sake—if we don’t rush in now, aren’t we just waiting to be the bagholder for the dog-manipulators? 👇👇👇👇👇
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin