Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Pre-IPOs
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Promotions
AI
Gate AI
Your all-in-one conversational AI partner
Gate AI Bot
Use Gate AI directly in your social App
GateClaw
Gate Blue Lobster, ready to go
Gate for AI Agent
AI infrastructure, Gate MCP, Skills, and CLI
Gate Skills Hub
10K+ Skills
From office tasks to trading, the all-in-one skill hub makes AI even more useful.
GateRouter
Smartly choose from 40+ AI models, with 0% extra fees
DeepInfra, Series B funding of $107 million... Global expansion of AI inference infrastructure
DeepInfra, a cloud computing company focused on artificial intelligence inference, raised $107 million in Series B funding to expand devices globally. Measured in Korean won, the amount is approximately 158,192.50 million won. As generative AI moves beyond the experimental stage and transitions toward “agent-based AI,” DeepInfra plans to break through the limitations of existing general-purpose cloud computing.
This round of investment was led by 500 Global and George Harrick, a former Google Cloud engineer. In addition, NVIDIA, Samsung Electronics’ investment arm Samsung Next, Supermicro, A.Capital Ventures, Crescent Cove, Peliciis, Peak6, Upper90, and others also participated. Notably, the major investors are all paying attention to the growth potential of the AI infrastructure market.
DeepInfra is a company focused on the “inference” stage within AI workloads. Inference refers to the stage in which a trained AI model processes user requests in real-world service environments. The company believes existing cloud platforms were not designed with this kind of demand in mind. They explain that, especially for agent-based AI, completing a single task may require dozens to hundreds of model calls, which can lead to sharply unstable latency and costs.
To solve this problem, DeepInfra says it is building a “token factory” that treats inference as a core process rather than an add-on service. The company was founded by an engineering team that created the instant messaging app “imo,” which has expanded to more than 200 million users worldwide. Its strategy is to transfer operational experience from large-scale distributed systems to AI inference infrastructure.
Unlike existing operators that rely on borrowing spare capacity from external servers like “Spot,” DeepInfra operates its own hardware across eight data centers in the United States. It describes its approach as improving efficiency by controlling the entire stack—from GPUs to APIs. The company claims that, using NVIDIA’s distributed inference platform “Dynamo” and Blackwell and Vera Rubin GPUs, it can deliver up to 20 times inference cost efficiency.
In particular, DeepInfra believes agent-based AI consumes far more resources than current generative AI chatbots. At present, more than 30% of all token traffic generated on its platform comes from autonomous agents, which supports this view. This shows that AI inference demand is moving beyond simple chatbot responses and rapidly shifting toward automated business processing.
Currently, the DeepInfra platform supports more than 190 open-source AI models, including NVIDIA’s Nemotron series. It also offers a “zero data retention” policy for enterprise customers that feel burdened by sending sensitive information to external cloud environments. This has been interpreted as an intention to build a differentiated advantage in the enterprise AI market while taking into account security, cost, and speed at the same time.
Co-founder and CEO Nikolai Borisov said that, from the time he founded the company four years ago, he believed AI inference would become the core of enterprise AI workloads—and that belief is now a reality. He analyzed that open-source models are quickly catching up to closed-source models to spread innovation at lower cost, and that agent-based systems are creating sustained and large-scale demand. He went on to emphasize that AI inference is no longer a thin processing layer, but will become the “bottleneck” that defines most future AI workloads.
Tony Wang of 500 Global also commented, saying that as AI inference demand surges, developers and engineers need infrastructure that is faster, more flexible, and more stable. He said the DeepInfra team has already proven its ability to build and operate globally scaled distributed systems and believes purpose-built AI inference infrastructure will become a core foundation supporting the next phase of the AI industry.
This funding round is not just a simple fundraising effort—it also shows that the focus of competition in AI infrastructure is shifting from training to inference. In particular, as agent-based AI becomes formally mainstream, how to handle AI inference quickly and at low cost is increasingly becoming a new deciding factor in the cloud computing market.
TP AI Notice: This article is summarized based on the TokenPost.ai language model. It may omit the original text’s main content or be inconsistent with facts.