NVIDIA Reveals Blackwell Cost Breakdown: GPU Costs Double, Token Costs 35 Times Cheaper

According to monitoring by Dongcha Beating, NVIDIA’s blog dissects the selection of inference hardware, with the core argument summarized in one sentence: evaluating inference infrastructure should focus on ‘cost per token’ rather than ‘cost per GPU per hour.’ In terms of GPU pricing, Blackwell is more expensive; however, when comparing token costs, Blackwell significantly outperforms the previous generation. The blog uses DeepSeek-R1 (an MoE inference model) as a test subject, comparing Blackwell (GB300 NVL72) with the previous generation Hopper (HGX H200). Based on cloud market rental reference prices, Blackwell costs $2.65 per GPU per hour, nearly double Hopper’s $1.41, but the token output per second per GPU jumps from 90 to 6000, resulting in a 65-fold throughput increase. Consequently, the cost per million tokens drops from $4.20 to $0.12. The token output per megawatt increases by 50 times. It’s important to note that the $0.12 figure is based on multiple software optimizations being fully enabled, including FP4 low-precision inference and MTP (multi-token prediction, allowing the model to generate multiple tokens at once for speed). Original data from SemiAnalysis InferenceX v2 shows that the same GB300 NVL72 running DeepSeek-R1, without MTP, has a cost of about $2.35 per million tokens, which drops to about $0.11 with MTP enabled, highlighting a 21-fold difference due to this single optimization. All of the above results are from tests of the DeepSeek-R1 single model, and numbers may vary under different model architectures and scales.

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments