2025-09-29 21:48:48

Congrats to the research team for advancing DeepSeek V3/R1 inference.

On NVIDIA GB200 NVL72, they're achieving 26k input tokens/s and 13k output tokens/s per GPU — a nearly 4× / 5× speedup vs H100.

They achieved this with NVFP4 MoE, FP8 attention, scaling-down expert parallelism

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

14 Likes

Reward
14
7
Repost
Share

Comment

0/400

No comments

Trending TopicsView More
#Gateperpdexislive
34.7K Popularity
#Joingrowthpointsdrawtowiniphone17
88.5K Popularity
#Cryptomarketrebound
209.2K Popularity
#ShowMyAlphaPoints
167.4K Popularity
#Fedofficialsspeakup
16.7K Popularity

Sitemap