Recently, NVIDIA made an interesting move in the inference market. They acquired Groq's chip business for $200 billion, bringing in the key team led by Jonathan Ross. The curious thing is that Groq continues to operate as an independent company, so it's not a full acquisition.



What caught my attention was Huang Renxun's explanation of why they did this. It turns out that the inference market is not monolithic. Previously, everything revolved around squeezing more performance, period. But now things have changed. Users are willing to pay different prices depending on response speed. If an engineer can process tokens faster and be more productive, they are willing to invest in that.

That's where Groq comes in. Its LPU architecture is known for low deterministic latencies, the opposite of what NVIDIA does with its high-performance GPUs. It's as if they are completing a spectrum: on one side, maximum performance; on the other, maximum response speed. Two market segments, two different prices, same model.

At the March GTC conference, they launched the Groq 3 LPU with Samsung's 4 nm process. The numbers are impressive: 35 times more inference efficiency per megawatt compared to the Blackwell NVL72. It's the kind of differentiation that opens new markets instead of just competing in the existing ones.

Groq's move here is clear: while NVIDIA dominates high performance, they specialize in what users who value speed above all need. Two strategies, a more complete ecosystem.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin