DeepInfra completes $107 million Series B funding, led by 500 Global and former Google Cloud engineer George Harrick, with participation from NVIDIA and others. The company focuses on AI inference, operating its own hardware in 8 data centers in the U.S., controlling the entire stack from GPU to API to achieve efficient inference, claiming up to 20 times cost savings. Autonomous proxy tokens account for over 30%, and proxy-based AI drives inference toward automated business processes. The platform supports over 190 open-source models, offering zero data retention. The goal is to make inference a core process and build a globally distributed inference infrastructure.

TechubNews

2026-05-05 03:58:12

Abstract generation in progress

DeepInfra, a cloud computing company focused on artificial intelligence inference, raised $107 million in Series B funding to expand devices globally. Measured in Korean won, the amount is approximately 158,192.50 million won. As generative AI moves beyond the experimental stage and transitions toward “agent-based AI,” DeepInfra plans to break through the limitations of existing general-purpose cloud computing.

This round of investment was led by 500 Global and George Harrick, a former Google Cloud engineer. In addition, NVIDIA, Samsung Electronics’ investment arm Samsung Next, Supermicro, A.Capital Ventures, Crescent Cove, Peliciis, Peak6, Upper90, and others also participated. Notably, the major investors are all paying attention to the growth potential of the AI infrastructure market.

DeepInfra is a company focused on the “inference” stage within AI workloads. Inference refers to the stage in which a trained AI model processes user requests in real-world service environments. The company believes existing cloud platforms were not designed with this kind of demand in mind. They explain that, especially for agent-based AI, completing a single task may require dozens to hundreds of model calls, which can lead to sharply unstable latency and costs.

To solve this problem, DeepInfra says it is building a “token factory” that treats inference as a core process rather than an add-on service. The company was founded by an engineering team that created the instant messaging app “imo,” which has expanded to more than 200 million users worldwide. Its strategy is to transfer operational experience from large-scale distributed systems to AI inference infrastructure.

Unlike existing operators that rely on borrowing spare capacity from external servers like “Spot,” DeepInfra operates its own hardware across eight data centers in the United States. It describes its approach as improving efficiency by controlling the entire stack—from GPUs to APIs. The company claims that, using NVIDIA’s distributed inference platform “Dynamo” and Blackwell and Vera Rubin GPUs, it can deliver up to 20 times inference cost efficiency.

In particular, DeepInfra believes agent-based AI consumes far more resources than current generative AI chatbots. At present, more than 30% of all token traffic generated on its platform comes from autonomous agents, which supports this view. This shows that AI inference demand is moving beyond simple chatbot responses and rapidly shifting toward automated business processing.

Currently, the DeepInfra platform supports more than 190 open-source AI models, including NVIDIA’s Nemotron series. It also offers a “zero data retention” policy for enterprise customers that feel burdened by sending sensitive information to external cloud environments. This has been interpreted as an intention to build a differentiated advantage in the enterprise AI market while taking into account security, cost, and speed at the same time.

Co-founder and CEO Nikolai Borisov said that, from the time he founded the company four years ago, he believed AI inference would become the core of enterprise AI workloads—and that belief is now a reality. He analyzed that open-source models are quickly catching up to closed-source models to spread innovation at lower cost, and that agent-based systems are creating sustained and large-scale demand. He went on to emphasize that AI inference is no longer a thin processing layer, but will become the “bottleneck” that defines most future AI workloads.

Tony Wang of 500 Global also commented, saying that as AI inference demand surges, developers and engineers need infrastructure that is faster, more flexible, and more stable. He said the DeepInfra team has already proven its ability to build and operate globally scaled distributed systems and believes purpose-built AI inference infrastructure will become a core foundation supporting the next phase of the AI industry.

This funding round is not just a simple fundraising effort—it also shows that the focus of competition in AI infrastructure is shifting from training to inference. In particular, as agent-based AI becomes formally mainstream, how to handle AI inference quickly and at low cost is increasingly becoming a new deciding factor in the cloud computing market.

TP AI Notice: This article is summarized based on the TokenPost.ai language model. It may omit the original text’s main content or be inconsistent with facts.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
WCTCTradingKingPK
671.13K Popularity
#
USSeeksStrategicBitcoinReserve
58.84M Popularity
#
BitcoinETFOptionLimitQuadruples
1.07M Popularity
#
#FedHoldsRateButDividesDeepen
51.29K Popularity
#
DeFiLossesTop600MInApril
10.21M Popularity

Sitemap

DeepInfra, Series B funding of $107 million... Global expansion of AI inference infrastructure

Trending Topics

WCTCTradingKingPK

USSeeksStrategicBitcoinReserve

BitcoinETFOptionLimitQuadruples

#FedHoldsRateButDividesDeepen

DeFiLossesTop600MInApril

Pin