Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Pre-IPOs
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Promotions
AI
Gate AI
Your all-in-one conversational AI partner
Gate AI Bot
Use Gate AI directly in your social App
GateClaw
Gate Blue Lobster, ready to go
Gate for AI Agent
AI infrastructure, Gate MCP, Skills, and CLI
Gate Skills Hub
10K+ Skills
From office tasks to trading, the all-in-one skill hub makes AI even more useful.
GateRouter
Smartly choose from 30+ AI models, with 0% extra fees
An interesting thing happened with the AI inference market that’s worth discussing. NVIDIA acquired Groq, and when Juan Renxun started explaining the logic behind this deal, it became clear that it’s not just for nothing.
Until now, the focus was on one thing: how to process more data simultaneously, that is, on throughput. But it turns out the market has split. Some users are willing to pay a higher price to get a response faster. Tokens have become more expensive, and the time to generate them has started to have real value. This changes the entire game.
So, Groq specializes exactly in this — low latency. Their LPU architecture is built to provide deterministic, predictable latency. When NVIDIA acquired Groq, they essentially filled a gap in their portfolio. NVIDIA’s GPUs remain kings of throughput, but for the low-latency segment, a different architecture is needed.
The new Groq 3 LPU chip is the first product after the merger, manufactured with 4nm technology. According to NVIDIA, its efficiency when working with large models exceeds their flagship Blackwell NVL72 by 35 times. This isn’t about absolute speed but about how much power is needed to achieve that speed.
Practically, this means that now different solutions can be offered for different needs: if you want maximum throughput — there’s GPU; if you need a quick response at any cost — there’s Groq. The same model can cost differently depending on how fast you want the result. This expands the boundaries of what can be optimized in the inference market.