📰 【The Most Powerful Open-Source Model DeepSeek V4 Is Finally Here! A 1.6 Trillion-Parameter Model, MIT License, Long-Text GPU Memory Compression to One-Tenth of V3.2】


According to Beating monitoring, DeepSeek’s open-source V4 series preview version uses the MIT license, and the weights are now available on Hugging Face and ModelScope. This series includes two MoE models: V4-Pro has a total of 1.6T parameters, with 49B (49 billion) activated per token; V4-Flash has a total of 284B parameters (284 billion), with 13B activated (1.3 billion). Both support a 1M token context. The architecture has three upgrades: a hybrid attention mechanism (compressed sparse attention CSA + heavily compressed attention HCA) that significantly reduces long-context overhead. In the 1M context, V4...

Brothers, DeepSeek has made another big move—its V4 open-source model directly has 1.6 trillion parameters, and the MIT license can be used freely. This guy is really bold, treating large models like cabbage and throwing them out for retail AI users to get on board. Don’t miss this wave of technological dividends, like how you missed out on the King of Crypto back then. 👇👇👇👇👇
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin