Countdown to the End of the AI High-Pricing Era? Five Structural Reasons Why Tokens Must Drop in Price

Diminishing marginal performance improvements, open-source models costing only a tenth, specialized chips slashing inference costs, zero switching costs allowing users to jump ship instantly—local models could end subscription models within 4 to 5 years. Is the room for AI giants to maintain high pricing narrowing rapidly?

(Previous: OpenAI flagship model GPT-5.6 Sol debuts exclusively on Cerebras; "White-haired stock guru" Serenity says "technology validated" and enters to buy the dip)

(Background: Citrini Research: Avoid the AI bubble! Names "5 high-profit blind spots" facing capital rotation)

Table of Contents

Toggle

  • Double Squeeze from Performance Ceiling and Open Source
  • Chip Revolution and Zero Switching Costs
  • Local Models: The Ultimate Threat to Subscription Models

Software engineer Aditya Patadia pointed out in his personal blog: Uber burned through its entire year's AI budget in 4 months, and Microsoft, Salesforce, and GitHub have also announced plans to control employee AI spending. This is a common dilemma across the entire industry, not just a financial discipline issue for individual companies. But he predicts that the expensive pricing structure of current top AI companies is about to reverse.

Double Squeeze from Performance Ceiling and Open Source

Patadia's first observation: Model performance improvements are diminishing marginally. Each iteration of a model still brings progress, but the gains are getting smaller, and the issue with training data is structural—major AI labs have likely already digested nearly all digitized written knowledge in human history, making further improvement to training sets extremely difficult.

He cites that Claude Opus 4.8 and Claude Opus 4.7 are priced the same as evidence: when models can no longer demonstrate significant leaps across generations, the justification for price increases disappears, leaving only price cuts as a competitive option.

The second pressure comes from the open-source camp. Using GLM-5.2 as an example, this open-source model is comparable to GPT 5.5 and Claude Opus in code benchmark testing, yet its pricing is only one-tenth of GPT 5.5, creating an overwhelming pricing advantage.

Patadia's judgment: As long as open-source models continue to narrow the performance gap with closed-source flagships, the pricing room for closed-source models will keep shrinking.

Chip Revolution and Zero Switching Costs

Another pressure line for AI pricing comes from the hardware side. Patadia points out that AI-specific chips developed by companies like Cerebras, Groq, and Google are rewriting the baseline of inference costs. For example, Google's TPU offers inference costs 30% to 70% cheaper than Nvidia's H100 GPU.

Simply put, the same computational load can save a significant amount of money by using the right chip, and this gap directly compresses the pricing floor of model service providers. Beyond chips, model architectures themselves are also reducing costs: caching mechanisms mean repeated queries don't need to be recomputed, and Mixture of Experts (MoE) architecture—in layman's terms—allows the model to call only part of the "experts" on demand, without activating all neurons every time, significantly reducing computational overhead while maintaining equivalent accuracy.

There's another factor Patadia believes is the most underestimated structural element: zero switching costs.

His comparison is straightforward: the moats of traditional software like Windows, Adobe, and Salesforce lie in the fact that replacing them is extremely costly, often requiring months of migration engineering. AI models have no such moat. AI gateway services like OpenRouter.ai allow developers to switch between model providers in seconds, and can even programmatically have systems auto-switch between different providers.

When competitors can be replaced instantly at any time, any attempt by a vendor to raise prices will directly drive users away.

Local Models: The Ultimate Threat to Subscription Models

Patadia's boldest prediction points to local models. His estimate is within 4 to 5 years: continued improvements in chip performance, coupled with the inevitable decline in RAM prices, will allow consumer-grade computers and smartphones to run language models locally. He further predicts that mainstream operating systems will come with built-in model deployment interfaces, enabling local applications to call local models directly.

If this scenario materializes, what does it mean? Cloud models would only be needed for the most complex tasks—legal document analysis, long-context reasoning, cross-database integration. Everyday tasks like code auto-completion, document proofreading, and basic fact-checking would be done locally, eliminating the need for monthly cloud subscription fees of $20 or even $200.

Of course, Patadia himself notes that this is a "prediction," not a certainty, and he calls these his "bold bets"—time will tell. But the five pressure directions above—diminishing performance gains, rising open-source alternatives, specialized chip cost reduction, zero switching costs, and local model substitution—all have real-world cases supporting them, not just thought experiments.

If Patadia's predictions are correct, that's good news for users. But for AI companies charging money? That's a different story.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments