DeepInfra, Series B funding of $107 million... Global expansion of AI inference infrastructure

robot
Abstract generation in progress

DeepInfra, a cloud computing company focused on artificial intelligence inference, raised $107 million in Series B funding to expand devices globally. Measured in Korean won, the amount is approximately 158,192.50 million won. As generative AI moves beyond the experimental stage and transitions toward “agent-based AI,” DeepInfra plans to break through the limitations of existing general-purpose cloud computing.

This round of investment was led by 500 Global and George Harrick, a former Google Cloud engineer. In addition, NVIDIA, Samsung Electronics’ investment arm Samsung Next, Supermicro, A.Capital Ventures, Crescent Cove, Peliciis, Peak6, Upper90, and others also participated. Notably, the major investors are all paying attention to the growth potential of the AI infrastructure market.

DeepInfra is a company focused on the “inference” stage within AI workloads. Inference refers to the stage in which a trained AI model processes user requests in real-world service environments. The company believes existing cloud platforms were not designed with this kind of demand in mind. They explain that, especially for agent-based AI, completing a single task may require dozens to hundreds of model calls, which can lead to sharply unstable latency and costs.

To solve this problem, DeepInfra says it is building a “token factory” that treats inference as a core process rather than an add-on service. The company was founded by an engineering team that created the instant messaging app “imo,” which has expanded to more than 200 million users worldwide. Its strategy is to transfer operational experience from large-scale distributed systems to AI inference infrastructure.

Unlike existing operators that rely on borrowing spare capacity from external servers like “Spot,” DeepInfra operates its own hardware across eight data centers in the United States. It describes its approach as improving efficiency by controlling the entire stack—from GPUs to APIs. The company claims that, using NVIDIA’s distributed inference platform “Dynamo” and Blackwell and Vera Rubin GPUs, it can deliver up to 20 times inference cost efficiency.

In particular, DeepInfra believes agent-based AI consumes far more resources than current generative AI chatbots. At present, more than 30% of all token traffic generated on its platform comes from autonomous agents, which supports this view. This shows that AI inference demand is moving beyond simple chatbot responses and rapidly shifting toward automated business processing.

Currently, the DeepInfra platform supports more than 190 open-source AI models, including NVIDIA’s Nemotron series. It also offers a “zero data retention” policy for enterprise customers that feel burdened by sending sensitive information to external cloud environments. This has been interpreted as an intention to build a differentiated advantage in the enterprise AI market while taking into account security, cost, and speed at the same time.

Co-founder and CEO Nikolai Borisov said that, from the time he founded the company four years ago, he believed AI inference would become the core of enterprise AI workloads—and that belief is now a reality. He analyzed that open-source models are quickly catching up to closed-source models to spread innovation at lower cost, and that agent-based systems are creating sustained and large-scale demand. He went on to emphasize that AI inference is no longer a thin processing layer, but will become the “bottleneck” that defines most future AI workloads.

Tony Wang of 500 Global also commented, saying that as AI inference demand surges, developers and engineers need infrastructure that is faster, more flexible, and more stable. He said the DeepInfra team has already proven its ability to build and operate globally scaled distributed systems and believes purpose-built AI inference infrastructure will become a core foundation supporting the next phase of the AI industry.

This funding round is not just a simple fundraising effort—it also shows that the focus of competition in AI infrastructure is shifting from training to inference. In particular, as agent-based AI becomes formally mainstream, how to handle AI inference quickly and at low cost is increasingly becoming a new deciding factor in the cloud computing market.

TP AI Notice: This article is summarized based on the TokenPost.ai language model. It may omit the original text’s main content or be inconsistent with facts.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin