When AI Billings Go Out of Control, Model Routers Become the New Cost-Saving Darling for Enterprises

robot
Abstract generation in progress

As enterprise AI usage costs continue to climb, a technology known as "model routers" is rapidly moving from niche tools to the mainstream. These systems automatically allocate the most suitable AI model based on task complexity, significantly reducing expenses without notably sacrificing quality, attracting widespread attention from startups to large enterprises.

The core logic of model routers is that not all tasks require the most expensive frontier models. Basic work such as summarizing emails or retrieving documents can be handled by open-source models or older proprietary models at a fraction of the cost of top-tier models. Companies like Snowflake and Palo Alto Networks have confirmed to The Information that they have achieved substantial cost savings by replacing specific tasks with cheaper models.

This trend is generating real business returns. Construction firm McCarthy Building reported that through Palantir's routing tool Evolve, its quarterly AI token usage dropped 60% compared to the same period last year. Palantir itself disclosed that in one specific case, the tool reduced computing costs by 97% by switching tasks from OpenAI's GPT-5.1 to the smaller GPT-5.4 Nano model.

From Manual Model Selection to Automatic Routing: An Industry Turning Point

The concept of model routers is not entirely new, but it truly entered the public eye after OpenAI released GPT-5. This model automatically switches between different models within ChatGPT based on the complexity of user prompts, embedding routing logic directly into the product. Since then, routers capable of scheduling models across multiple providers have rapidly proliferated.

Currently, routers on the market come in various forms: standalone products, built-in modules from cloud service providers, and custom solutions built by enterprise IT departments. The common goal of these tools is to replace manual model selection by users, thereby reducing costs while maintaining output quality.

Databricks' Unity AI Gateway is one example. CEO Ali Ghodsi said the tool is "very popular" because many enterprises "are burning through their budgets too quickly." Databricks had been using it internally for some time before rolling it out to customers.

From Startups to Tech Giants: Full Participation

The router track is attracting players of all sizes. According to a previous report by The Information, in April, startup OpenRouter, which provides routing technology, completed a new $120 million funding round, reflecting strong capital market enthusiasm for this direction.

OpenRouter's "automatic router" decides which model to call based on user preferences for cost and quality (set on a scale of 0 to 10). Data shows that the router selects Google's relatively inexpensive Gemini 2.5 Flash Lite about one-third of the time, while calling OpenAI's more powerful GPT-5.5 only about 10% of the time. OpenRouter's automatic router is powered at its core by startup Not Diamond, which specializes in developing routing systems for AI coding agents.

Japanese AI lab Sakana AI recently released a router-based multi-model collaborative system. In tests, the system mainly assigned math problems to OpenAI's GPT-5.5 and science problems to Google's Gemini, reasoning that the system judged these two models as superior to other options in their respective domains. Sakana AI claims the system's overall performance on benchmarks such as programming, engineering, scientific tasks, and reasoning is "on par" with Anthropic's Fable 5 and Mythos Preview models.

AI coding application Cognition also released a new router this week, using its internal benchmarks to identify the relative strengths of different agents and introducing a "sidekick" agent to handle simpler tasks. Cognition stated that the router achieved score levels matching Fable 5 on a certain coding benchmark, but at 35% lower cost.

DIY Routing: Low-Cost Solutions Also Work

Not all enterprises need to buy specialized routing products. Developers can build their own routers using AI coding agents like Claude Code, or even directly let an AI model decide which model is best suited for a specific query.

Hunter Bown, who works on AI agents at Arcee AI, said he habitually uses DeepSeek V4 Flash for model selection because of its low cost. His approach is to provide DeepSeek with a list of models and let it determine which model is best for handling the current prompt.

However, such "quick-build" solutions have their limitations. Shriyash Upadhyay, founder of router provider Martian, pointed out that more complex routers sometimes show impressive benchmark scores but may not match them in actual performance. He also noted that even with more sophisticated routers, predicting the best model based solely on the user's first prompt is quite challenging.

Upadhyay said that the rapid pace of model iteration and constantly changing capability differences make routing decisions increasingly complex. "Companies don't have infinite data on all different tasks, so you have to really go deep into the models to figure out what they're good at." To this end, when making routing decisions, Martian not only considers the output results of models but also examines the internal computational processes that constitute these models.

Cost Pressure Persists, Demand for Routers Expected to Grow

Enterprise anxiety over AI costs is not a short-term phenomenon. As employee usage of advanced AI models (the "tokenmaxxing" phenomenon) continues to increase, management scrutiny of AI spending is also intensifying. This backdrop provides sustained demand drivers for model routers.

Beyond routing functionality, Palantir's Evolve tool can automatically adjust prompt content based on the selected model and prevent requests from being sent repeatedly to the model—one common cause of overcharging. The McCarthy Building case shows that by optimizing prompt structure, enterprises can consume fewer tokens while using frontier models and still get the same output.

For investors, the warming of the model router track means: on one hand, startups like OpenRouter focused on routing technology are gaining capital favor; on the other, companies like Databricks and Palantir, which integrate routing capabilities into enterprise AI platforms, are using this to strengthen their product competitiveness. As AI infrastructure spending continues to expand, the tool layer that helps businesses control this spending is becoming an emerging market that cannot be ignored.

Risk Warning and Disclaimer

        Market risk exists, and investment must be cautious. This article does not constitute personal investment advice, nor does it consider the specific investment objectives, financial circumstances, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article are suitable for their specific situation. Investment based on this is at your own risk.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned