Gartner: The cost of reasoning on large language models will decrease by over 90% by 2030.

According to Gartner, by 2030, the cost of performing inference on a large language model (LLM) with a trillion parameters will decrease by more than 90% compared to 2025, enabling generative AI (GenAI) providers to save significant costs.

An AI token is the data unit processed by generative AI models. In this analysis, one token is equivalent to 3.5 bytes of data, approximately 4 characters.

Gartner senior analyst Will Sommer said: “These cost reductions will be driven by multiple factors, including improvements in semiconductor and infrastructure efficiency, innovations in model design, increased chip utilization, greater use of specialized inference chips for specific applications, and the deployment of edge devices in particular scenarios.”

Due to these trends, Gartner predicts that by 2030, the cost-effectiveness of large language models will be up to 100 times higher than that of early models of similar scale developed in 2022.

The forecast model results are divided into two semiconductor scenarios:

Cutting-edge scenario: models process simulated data based on advanced chips.

Traditional hybrid scenario: models process data based on a typical mix of existing semiconductors, evaluated according to forecast data from Gartner Consulting.

In the “hybrid” scenario, the calculated costs are significantly higher than in the “cutting-edge” scenario.

Forecast scenarios for general AI inference costs

Cost reductions will not make frontier intelligence mainstream

However, the decline in token prices for generative AI service providers will not be fully passed on to enterprise customers. Additionally, the number of tokens required for frontier intelligence applications will far exceed that of current mainstream applications. For example, an agent model needs 5 to 30 times more tokens to complete each task than a standard generative AI chatbot, and it can perform more tasks than humans using generative AI.

While lower per-token costs will enable more advanced generative AI to have stronger capabilities, these advancements will also lead to a substantial increase in token demand. Because token consumption outpaces the reduction in token costs, overall inference costs are expected to rise.

Sommer stated: “Chief product officers should not confuse the depreciation of commoditized tokens with the democratization of frontier inference. As the costs of commoditized intelligent technologies approach zero, the computing resources and systems required to support advanced inference will remain extremely scarce. Those chief product officers who today mask architectural inefficiencies with cheap tokens will find it difficult to achieve scaled autonomous expansion tomorrow.”

Platforms capable of coordinating and processing workloads across various models will gain value. Routine, high-frequency tasks must be assigned to smaller, more efficient, and domain-specific language models, as these models can perform specific workflows more effectively at a fraction of the cost of general-purpose solutions. High-cost inference for frontier-level models must be strictly limited and reserved for high-profit, complex reasoning tasks.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin