An interesting thing happened with the AI inference market that’s worth discussing. NVIDIA acquired Groq, and when Juan Renxun started explaining the logic behind this deal, it became clear that it’s not just for nothing.



Until now, the focus was on one thing: how to process more data simultaneously, that is, on throughput. But it turns out the market has split. Some users are willing to pay a higher price to get a response faster. Tokens have become more expensive, and the time to generate them has started to have real value. This changes the entire game.

So, Groq specializes exactly in this — low latency. Their LPU architecture is built to provide deterministic, predictable latency. When NVIDIA acquired Groq, they essentially filled a gap in their portfolio. NVIDIA’s GPUs remain kings of throughput, but for the low-latency segment, a different architecture is needed.

The new Groq 3 LPU chip is the first product after the merger, manufactured with 4nm technology. According to NVIDIA, its efficiency when working with large models exceeds their flagship Blackwell NVL72 by 35 times. This isn’t about absolute speed but about how much power is needed to achieve that speed.

Practically, this means that now different solutions can be offered for different needs: if you want maximum throughput — there’s GPU; if you need a quick response at any cost — there’s Groq. The same model can cost differently depending on how fast you want the result. This expands the boundaries of what can be optimized in the inference market.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin