Tokens become "money-consuming beasts" as big companies' computing power skyrockets

金色财经_ · 2026-04-03T12:48:03+00:00

In the past year, the AI computing power market has experienced significant fluctuations, shifting from price reductions to price increases. Cloud providers raised prices due to rising costs and surging demand, with computing usage skyrocketing to 140 trillion tokens per day. The rise of intelligent agents is the main driver. Large companies benefit from the price hikes, while small and medium startups face greater cost pressures. This phenomenon marks a shift in the AI industry from price competition to value competition. The key to future survival lies in how effectively computing power is utilized.

金色财经_

2026-04-03 12:48:03

Abstract generation in progress

Have you noticed that in the past half year, the art style in the AI community has started to shift a bit too fast?

At this time last year, everyone was celebrating a price war over compute—cloud providers were cutting prices like crazy. Alibaba Cloud led the charge with “up to a 60% maximum reduction.” Then Tencent Cloud, Huawei Cloud, and Baidu Cloud all quickly followed suit. That lineup was basically like a Double 11 sale. In startup groups, people posted their bills every day: “Look, I only spent a few cents for my 1 million Tokens!” Back then, everyone thought the spring of AI startups was here—compute was as cheap as cabbage. Who couldn’t afford large-model applications anymore?

What happened? The slap came way too fast.

Just last month, the wind direction flipped 180 degrees. In just 10 days, Google, Amazon, Tencent, Alibaba, and Baidu all released price-hike announcements. How much did they raise? Generally 30% to 50%. The most aggressive was Tencent Cloud—one core product directly jumped by 400%.

From “jump-off-a-building mass clearance” to “rocket-like price increases,” it was less than a year between the two. What exactly happened? Who is pushing prices up behind the scenes? And more importantly, in this wave of price hikes, who has it worst—and who’s secretly laughing?

They were “cutting to the bone” last year—so why did everyone raise prices this year?

First, let’s quickly review this “plot twist drama.”

In April 2025, Alibaba Cloud threw out a major bomb first: a maximum 60% price reduction on its core products. This wasn’t a small tweak—it was basically “halving, then discounting again.” Immediately after that, JD Cloud said, “Go ahead and cut—I’ll follow.” Tencent Cloud, Huawei Cloud, and Baidu Cloud all followed suit. For a moment, the compute market was filled with smoke, and the price war was raging—lively doesn’t even begin to cover it.

So what were the slogans back then? “Make AI affordable” and “democratize compute.” Many startups really believed it. They started burning Tokens with big fanfare and running models at full speed.

However, free lunch never lasts long.

In January 2026, Amazon AWS quietly did something—without any press conference and without any advance notice. It directly raised EC2 server prices by about 15%. Don’t be fooled by the relatively modest percentage—its significance is huge: it’s the first time the cloud services industry has raised prices in nearly twenty years. Remember, over the past twenty years, AWS has lowered prices more than 100 times. There were never reasons to go up—only down.

And that was like toppling the first domino.

On March 11, Tencent Cloud followed suit. The input price for its Tencent HY2.0 Instruct model rose from 0.0008 yuan per thousand tokens to 0.004505 yuan per thousand tokens—an increase of 463%, more than four times. On March 18, Alibaba Cloud announced that prices for compute card products would rise by 5% to 34%, and Baidu Intelligent Cloud also followed with increases of 5% to 30%. Those large models that were previously available for free public testing—like GLM 5, MiniMax 2.5, and Kimi 2.5—ended their “freebie period” and moved into formal billing.

From “rushing to lower prices” to “rushing to raise prices”—why did it change so quickly?

On the surface, it’s because cloud providers can’t take it anymore. GPU chips are getting more expensive to buy, and data center electricity bills make up 40% to 60% of operating costs. On top of that, in the second half of 2025, storage chips also started rising in price. Pressure on the cost side has become real and tangible. But what truly makes price hikes turn into “something we have no choice but to do” is another, more fundamental reason—compute is simply not enough anymore.

Who’s been going crazy “consuming” Tokens? The truth of 140 trillion

Weren’t we told compute was oversupplied? So how did it suddenly become insufficient?

The answer is: Tokens are being eaten too fast.

According to data disclosed in March this year by Liu Liehong, head of China’s National Data Administration: by March 2026, the daily Token calling volume in China has already surpassed 140 trillion.

How outrageous is this number? Here are two benchmarks:

· In early 2024, this number was only 100 billion. In two years, it rose over a thousand times.

· At the end of 2025, this number was 100 trillion. That means that just over the past three months, it rose another 40%—the additional volume from only those three months (40 trillion) is 400 times the entire figure from early 2024.

This isn’t linear growth—it’s a tsunami.

So the question is: who is burning through Tokens like crazy?

The answer is just one word: agents.

Starting last year, products led by open-source agents like OpenClaw (commonly called “the lobster” in the community) became wildly popular. AI shifted from a robot that only “chatters” to an assistant that can “get work done”—helping you book flights, write code, make PPTs, analyze data, and more. Sounds cool, right? But the price is that when an agent does even a simple task, it consumes 10 to 100 times more Tokens than a normal conversation.

For example: if you ask AI to write a web-scraping script. In a normal conversation, it gives you a piece of code—you copy it and you’re done, consuming a few hundred Tokens. But if it’s an agent, it has to run the code itself, report errors, debug, run again, run again… back and forth for a dozen-plus rounds. The Token consumption shoots straight to tens of thousands.

And don’t even mention video generation—one of the ultimate “Token vampires.” Some analysis suggests that generating a 1-minute video consumes roughly 1.4M Tokens. And now, video models charge just a few tenths of a yuan up to a few yuan for generating a 5-second clip—this isn’t “making money,” it’s more like “losing money to attract business.” But with so many users, video, music, code, data analysis… every direction is going all in on consuming Tokens.

When supply can’t keep up with demand, compute prices naturally rise. This isn’t a conspiracy—it’s raw supply-demand imbalance.

The compute ranking race: big companies feast; small players can’t even get broth

For different people, this price hike means totally different things.

For cloud providers, price increases are actually a good thing. One broker calculated the numbers: for Alibaba Cloud, every 1% price increase boosts its profit margin by 1 percentage point. So the data you see is that Alibaba Cloud’s market share hasn’t fallen—instead it has risen, reaching 36% of China’s AI cloud market. And in the more granular track of AI call volume, Volcano Engine (ByteDance’s unit) is even closer to 50%—meaning about half of all Token calls in China go through Volcano Engine’s pipeline.

Meanwhile, Huawei Cloud and Tencent Cloud’s shares have dipped slightly. The head effect is getting more and more obvious: big companies only get stronger, and resources keep consolidating.

So who has it worst?

Small and mid-sized AI startups, and those new players just entering the market.

The reason is simple: price hikes directly increase their operating costs. When Tokens were cheap before, you could run experiments freely and tune models freely—anyway you wouldn’t spend much. Now prices have multiplied by several times or even more than ten times. Every training run, every inference, you have to carefully weigh it.

Even more troublesome is that small players have no pricing power. Big customers can sign long-term agreements with cloud vendors to lock in a relatively discounted price. If you’re a startup, spending only tens of thousands of yuan a year on compute fees—who would negotiate a discount with you? You can only pay the post-hike price honestly.

Many projects that originally planned to build AI applications did the math and then quietly shelved their plans. Some projects already underway either reduced scale or kept pushing through while absorbing losses. But terminal market competition is fierce—so you don’t dare to raise prices to users easily. “Next door” is still free; once you start charging, all users run away. In the end, the pressure of all those costs can only be swallowed by yourself.

An industry practitioner complained to me: “I used to think compute was cheap and the startup barrier was low. Only now did I realize the barrier isn’t lower—it’s that they let you in first, then they shut the door.”

This is really a brutal ranking race. Over the past two decades, cloud providers lived comfortably with the strategy of “selling volume with low prices, then profit after capturing territory.” But that era is over. Compute has officially exited the subsidy period and entered a stage of commercialized pricing. In the future, it won’t be a contest of who is cheaper—it will be a contest of who provides more stable service, who has a more complete ecosystem, and who can help enterprises truly make good use of every bit of compute.

And in this ranking race, small players will likely get left behind from the train.

Looking back at this past year-plus roller coaster, you’ll find a painful truth:

From “cabbage price” compute to “rocket price” compute, it’s essentially a snapshot of how the AI industry has shifted from barbaric growth toward maturity. The free era has ended, and value competition has begun. Business models that survive on subsidies will die, while products with real technology, real scenarios, and real users will survive in an environment where compute costs rise—and may even live better.

The core competitive advantage for AI startups has never been how cheap the compute is. It’s what you do with the compute.

In the compute era, Tokens are indeed expensive. But even more expensive than Tokens is the mind that knows how to use Tokens well.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.