📰 The most powerful open-source model DeepSeek v4 is finally here! A 1.6 trillion parameter model, MIT license, with long text VRAM compressed to one-tenth of V3.2.


According to Beating monitoring, the DeepSeek open-source V4 series preview version, licensed under MIT, weights are now available on Hugging Face and ModelScope. The series includes two MoE models: V4-Pro with a total of 1.6T parameters, activating 49B tokens (16k); V4-Flash with a total of 284B parameters (2.84 trillion), activating 13B (1.6T). Both models support 1 million token context. The architecture has three upgrades: hybrid attention mechanisms (compressed sparse attention CSA + heavily compressed attention HCA) significantly reduce long context overhead, enabling V4 to handle 1 million tokens...
Brothers, DeepSeek is stirring things up again! The V4 model with 1.6 trillion parameters, open-sourced under MIT license, compresses long text VRAM to just one-tenth of V3.2. This is the real technological revolution, not those trash projects that hype air coins.
$FET $AGIX These AI concept coins, can they ride this wave and take off this time? Old fans know, Soroge is most annoyed by those big-picture promises, but real technological breakthroughs must be seized decisively. Don’t wait for institutions to pump the market, you’re still dazed there. 👇👇👇👇👇
FET0.38%
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin