The whole thing started because the inference market changed. Before, everyone focused on one thing: increasing throughput, processing more requests at the same time. But then people realized that’s not always how it works in practice. Some users are willing to pay more for faster responses, regardless of the total volume.

It’s like this: you have two identical models, but one responds in 50ms and the other in 500ms. If you’re a developer building a real-time application, how much more would you pay for the one that’s 10 times faster? Exactly, this low-latency market is completely different from the high-throughput one.

That’s where Groq comes into this story. Their LPU architecture is specifically designed for this, for low deterministic latency. While NVIDIA’s GPUs dominate the massive throughput side, Groq fills a completely different gap. When you look at the Groq 3 LPU released in March, built on 4nm technology by Samsung, the inference capacity per megawatt in trillion-parameter models is 35 times better than the Blackwell NVL72. That’s no small feat.

What Huang is basically saying is that NVIDIA understood there isn’t a single inference market, but two very distinct segments with completely different pricing dynamics. You might have lower throughput, but if the unit price per token is much higher, it’s worth it. It’s like expanding Pareto’s frontier of the market.

This was a very well-thought-out move by NVIDIA, honestly. They recognized a gap and went after it. Jonathan Ross and Groq’s team continue to operate independently, but now with NVIDIA’s full backing. It seems like someone is finally thinking about inference in a more sophisticated way.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
WCTCTradingKingPK
176.78K Popularity
#
CryptoMarketSeesVolatility
239.83K Popularity
#
rsETHAttackUpdate
78.94K Popularity
#
US-IranTalksStall
191.97K Popularity
#
ETHMemeCoinFLORKSurges
42.42K Popularity

Sitemap

I found this story about NVIDIA with Groq quite interesting. Basically, what Huang Renxun explained now makes a lot of strategic sense.

Trending Topics

WCTCTradingKingPK

CryptoMarketSeesVolatility

rsETHAttackUpdate

US-IranTalksStall

ETHMemeCoinFLORKSurges

Pin