Congrats to the research team for advancing DeepSeek V3/R1 inference.



On NVIDIA GB200 NVL72, they're achieving 26k input tokens/s and 13k output tokens/s per GPU — a nearly 4× / 5× speedup vs H100.

They achieved this with NVFP4 MoE, FP8 attention, scaling-down expert parallelism
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 7
  • Repost
  • Share
Comment
0/400
No comments
  • Pin
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)