Computing power is becoming centralized again: After DeepSeek's price drop, who will control the infrastructure of AI?

null

— Starting from Gonka’s speech at LA Hacks 2026

On April 26, DeepSeek announced new pricing for the V4 series API: the cost for cache hits across the entire system was reduced to one-tenth of the launch price, and after a limited-time discount on the Pro version, the cost to process a million tokens dropped to just 0.025 yuan—nearly a hundredfold cheaper than a year ago. The A-shares computing power sector hit the daily limit, and market sentiment was boiling.

But behind the cheers, there’s a question no one is directly discussing: as models become cheaper and cheaper, the computational power needed to run them is becoming increasingly concentrated.

Data does not lie. In Q4 2025, Microsoft, Amazon, Meta, and Google’s cloud providers collectively increased their capital expenditures by 64% year-over-year to $118.6 billion; it’s expected that in 2026, total capital spending will further grow by 53% year-over-year, reaching $570.8 billion. During the same period, Google raised its 2026 TPU chip shipment target by 50% to 6 million units. The delivery cycle for Nvidia’s H100 series has already stretched to several months in some markets.

Pricing power at the model layer is shifting toward developers, but control over the compute layer is rapidly consolidating into the hands of a few giants. This is a subtle but profound contradiction in the AI era.

Against this backdrop, on April 24, 2026, Gonka Protocol co-founders Daniil and David Liberman took the main stage at LA Hacks 2026. This is UCLA’s largest annual hackathon, and this year, the keynote speaker was the Liberman brothers, addressing hundreds of top engineers about to enter the industry. The question they posed was especially clear at this moment: is decentralized compute still possible?

  1. The other side of the price drop

DeepSeek V4’s price reduction logic appears to be driven by technological progress—new attention mechanisms compress token dimensions, combined with DSA sparse attention, greatly reducing demands on computation and memory. But sustained price reductions depend on the premise that somewhere, enough compute power is available at low cost.

The reality is, this “sufficient” compute power is rapidly converging into a few nodes worldwide. Michael Hurlston, CEO of the optical communications leader Lumentum, recently stated that, based on current trends, the company’s capacity will be nearly sold out by 2028. This is not an isolated problem but a collective strain across the entire AI infrastructure supply chain amid soaring demand.

Daniil used a simple yet powerful comparison in his LA Hacks speech: the compute power of the Bitcoin network has already surpassed the combined total of Google, Microsoft, and Amazon’s data centers—yet what are they doing? Solving a hash puzzle that no one needs the answer to. Similarly, the idle GPU compute worldwide—graphics cards in gaming machines, servers in university labs, spare capacity held by small and medium cloud providers—is enormous, but due to lack of coordination mechanisms, it cannot be used for AI inference.

Gonka aims to solve this coordination problem—using proof-of-work incentives to organize scattered idle GPUs worldwide into a network capable of handling real AI inference tasks.

  1. Inference as the new battleground

DeepSeek’s price cuts have sparked widespread discussion about “AI democratization” in Chinese internet circles. But there’s an overlooked detail: the price cut is on “call costs,” not on “compute costs.” As AI applications scale, the growth in inference calls is exponential—industry forecasts suggest that by 2026, inference will account for about two-thirds of global AI compute consumption.

What does this mean? Lowering the cost per call by one order of magnitude only increases the total compute needed. The “democratization” of large models, to some extent, accelerates the centralization of compute—because only players with massive compute resources can sustain inference services at ultra-low margins.

This is a structural lock-in forming: whoever controls the physical compute for inference holds the key to the true infrastructure of AI. From this perspective, the value of decentralized compute networks is no longer just about “cutting costs by 50%,” but about providing a structural alternative before centralization becomes irreversible.

  1. A real challenge for young builders

Participants at LA Hacks—engineers and product managers from top California universities—will soon face a non-romantic engineering choice: on which layer of compute do they build their products?

Whose servers are powering your AI inference?

If that platform adjusts its pricing or access policies, do you have the ability to migrate?

Is the user base you help grow creating value for yourself, or just feeding chips into the platform?

These questions have already been experienced by developers in the Web2 era: when an app’s fate is deeply tied to platform algorithms or distribution rules, “independence” becomes a constantly redefined concept. In the AI era, compute dependency will replicate this logic at the infrastructure level, and because switching costs are higher, lock-in effects will only intensify.

Hackathons, as a format, have an inherent irony: building something functional in 36 hours with minimal resources and maximum speed—precisely what decentralized network incentives aim for. Daniil’s appearance on the LA Hacks stage isn’t just about talking Gonka; it’s more like asking this group: are you helping accelerate this centralization trend, or are you creating new possibilities?

  1. PoW 2.0: an engineering challenge

Gonka reorients proof-of-work incentives from hash calculations to AI inference, making nearly 100% of network compute contribution directly correspond to real tasks. This mechanism has a key engineering requirement: AI inference tasks must be verifiable and reproducible—given the same model weights, random seed, and input, any node can reproduce the results and verify their validity. This is the core engineering challenge for Gonka to move from academic prototype to a deployable network.

Economically, this mechanism’s significance lies in the fact that token value is inherently anchored to physical compute costs, not market sentiment. Miners contributing compute are rewarded, developers paying for inference are charged, and the entire incentive loop does not rely on the goodwill of middlemen.

Of course, technical feasibility is only part of the story. The harder question is: in an era of soaring compute demand and billion-dollar capital expenditures by major players, can a community-organized, distributed compute network truly scale to compete?

Early data from Gonka offers a reference point: less than a year after mainnet launch, the network’s aggregate compute power expanded from 60 H100 equivalents to over 10,000—driven by voluntary participation from hundreds of independent nodes worldwide. This doesn’t prove the scale problem is solved, but it shows the incentive mechanism effectively drove early growth.

  1. The window of opportunity

Historically, infrastructure dominance tends to converge rapidly in early stages—this was true for railroads, the internet, and mobile internet. Each time, some found ways to insert themselves before standards solidified, while others only realized their participation rights had shrunk after centralization was complete.

Where is AI compute infrastructure now? Based on the projected $570.8 billion in capital expenditure by the four cloud giants in 2026, centralization is accelerating; but from developers’ actual usage patterns, a large amount of unintegrated resources still exist on the supply side. This gap is the structural space where decentralized networks can still exist.

Daniil cited a comparison: after the dot-com bubble burst in 2000, what remained was not ruins but a global fiber optic network that supported the next twenty years of digital economy. After the AI infrastructure investment boom subsides, the residual compute protocols and incentive mechanisms will form the foundation of the next cycle—only the question is: which protocols have a solid enough underlying logic to keep running under pressure?

This isn’t about any specific project but about the entire decentralized AI track: can governance designs resist single-point control erosion? Will incentive mechanisms remain effective as scale increases? Can decentralized compute networks succeed simultaneously on technical, tokenomics, and governance levels?

Conclusion

DeepSeek’s price cuts have reignited the narrative of “AI democratization.” But democratized inference calls and democratized compute infrastructure are two different things. The former is happening; whether the latter can happen depends on how many people treat this as an engineering problem worth solving in the coming years, rather than just a catchy story.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments