Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Gartner: The cost of reasoning on large language models will decrease by over 90% by 2030.
According to Gartner, by 2030, the cost of performing inference on a large language model (LLM) with a trillion parameters will decrease by more than 90% compared to 2025, enabling generative AI (GenAI) providers to save significant costs.
An AI token is the data unit processed by generative AI models. In this analysis, one token is equivalent to 3.5 bytes of data, approximately 4 characters.
Gartner senior analyst Will Sommer said: “These cost reductions will be driven by multiple factors, including improvements in semiconductor and infrastructure efficiency, innovations in model design, increased chip utilization, greater use of specialized inference chips for specific applications, and the deployment of edge devices in particular scenarios.”
Due to these trends, Gartner predicts that by 2030, the cost-effectiveness of large language models will be up to 100 times higher than that of early models of similar scale developed in 2022.
The forecast model results are divided into two semiconductor scenarios:
Cutting-edge scenario: models process simulated data based on advanced chips.
Traditional hybrid scenario: models process data based on a typical mix of existing semiconductors, evaluated according to forecast data from Gartner Consulting.
In the “hybrid” scenario, the calculated costs are significantly higher than in the “cutting-edge” scenario.
Forecast scenarios for general AI inference costs
Cost reductions will not make frontier intelligence mainstream
However, the decline in token prices for generative AI service providers will not be fully passed on to enterprise customers. Additionally, the number of tokens required for frontier intelligence applications will far exceed that of current mainstream applications. For example, an agent model needs 5 to 30 times more tokens to complete each task than a standard generative AI chatbot, and it can perform more tasks than humans using generative AI.
While lower per-token costs will enable more advanced generative AI to have stronger capabilities, these advancements will also lead to a substantial increase in token demand. Because token consumption outpaces the reduction in token costs, overall inference costs are expected to rise.
Sommer stated: “Chief product officers should not confuse the depreciation of commoditized tokens with the democratization of frontier inference. As the costs of commoditized intelligent technologies approach zero, the computing resources and systems required to support advanced inference will remain extremely scarce. Those chief product officers who today mask architectural inefficiencies with cheap tokens will find it difficult to achieve scaled autonomous expansion tomorrow.”
Platforms capable of coordinating and processing workloads across various models will gain value. Routine, high-frequency tasks must be assigned to smaller, more efficient, and domain-specific language models, as these models can perform specific workflows more effectively at a fraction of the cost of general-purpose solutions. High-cost inference for frontier-level models must be strictly limited and reserved for high-profit, complex reasoning tasks.