NVIDIA releases Blackwell cost details: GPUs are twice as expensive, and each token is 35 times cheaper in return

robot
Abstract generation in progress

According to Beating Monitoring, NVIDIA published a blog analyzing hardware selection for inference, with the core argument: evaluating inference infrastructure should focus on “cost per token” rather than “cost per GPU per hour.” Using GPU unit price, Blackwell is more expensive; using token cost, Blackwell far surpasses the previous generation.

The blog uses DeepSeek-R1 (MoE inference model) as the test subject, comparing Blackwell (GB300 NVL72) with the previous Hopper (HGX H200). Based on cloud market rental reference prices, Blackwell costs $2.65 per GPU per hour, nearly double Hopper’s $1.41, but the token output per GPU per second jumps from 90 to 6,000. The 65-fold throughput increase spreads out, reducing the cost per million tokens from $4.20 to $0.12. The token output per megawatt increases by 50 times.

Preconditions to note: the $0.12 figure is based on all software optimizations being enabled, including FP4 low-precision inference and MTP (multi-token prediction, allowing the model to generate multiple tokens at once to speed up). SemiAnalysis InferenceX v2 raw data shows that running DeepSeek-R1 on GB300 NVL72 without MTP costs about $2.35 per million tokens; with MTP enabled, it drops to about $0.11, a 21-fold difference just from this optimization. All figures are from tests of the DeepSeek-R1 single model; different model architectures and scales will produce different numbers.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments