I found this story about NVIDIA with Groq quite interesting. Basically, what Huang Renxun explained now makes a lot of strategic sense.



The whole thing started because the inference market changed. Before, everyone focused on one thing: increasing throughput, processing more requests at the same time. But then people realized that’s not always how it works in practice. Some users are willing to pay more for faster responses, regardless of the total volume.

It’s like this: you have two identical models, but one responds in 50ms and the other in 500ms. If you’re a developer building a real-time application, how much more would you pay for the one that’s 10 times faster? Exactly, this low-latency market is completely different from the high-throughput one.

That’s where Groq comes into this story. Their LPU architecture is specifically designed for this, for low deterministic latency. While NVIDIA’s GPUs dominate the massive throughput side, Groq fills a completely different gap. When you look at the Groq 3 LPU released in March, built on 4nm technology by Samsung, the inference capacity per megawatt in trillion-parameter models is 35 times better than the Blackwell NVL72. That’s no small feat.

What Huang is basically saying is that NVIDIA understood there isn’t a single inference market, but two very distinct segments with completely different pricing dynamics. You might have lower throughput, but if the unit price per token is much higher, it’s worth it. It’s like expanding Pareto’s frontier of the market.

This was a very well-thought-out move by NVIDIA, honestly. They recognized a gap and went after it. Jonathan Ross and Groq’s team continue to operate independently, but now with NVIDIA’s full backing. It seems like someone is finally thinking about inference in a more sophisticated way.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin