U.S. tech companies are quietly integrating Chinese open-source AI models into their production infrastructure. As the cost of top-tier American model services continues to rise, companies like Coinbase are turning to Chinese open-source models as their default option, significantly reducing AI expenses without suppressing usage.

On Friday evening, Coinbase CEO Brian Armstrong posted on X that the company has set Zhipu’s newly released GLM 5.2 and Beijing Moonshot AI’s Kimi 2.7 as the default models for engineers via its internal LLM gateway. Armstrong stated that with routing optimization and caching improvements, Coinbase has cut AI spending by "nearly half," while token usage continues to grow exponentially.

Cost Advantage of Chinese Open-Source Models Takes Center Stage

In his post, Armstrong clearly noted that 91% of engineers had never hit the original usage caps, so instead of lowering caps or adding spending alerts, Coinbase opted to "switch to cheaper default models."

GLM 5.2 is from Zhipu, Kimi 2.7 from Beijing Moonshot AI, both of which are open-weight models. Armstrong said these models are deployed for routine tasks, while engineers can still use frontier models for tasks requiring complex planning. His logic: using top-tier models for execution is often "overkill."

For code review, a multi-model parallel strategy is used, allowing different models to cross-check each other's outputs to maintain quality standards.

Three-Layer Infrastructure Restructuring Drives Cost Reduction

Armstrong outlined three core measures.

First, intelligent routing: In a custom scheduling framework, the system preprocesses prompts and combines cache hit rates with model pricing to automatically distribute tasks to the most suitable and cost-effective model. He noted that the ultimate goal is to have AI, rather than humans, handle model selection.

Second, aggressive caching: Coinbase requires all requests to be cache-aware, reusing existing caches as much as possible. For example, with LibreChat, after properly implementing caching, the cache hit rate jumped from 5% to 60%.

Third, context streamlining: Armstrong recommends starting new sessions when switching tasks, narrowing the scope of file context, and disconnecting unused tools. He emphasized that the goal is not to reduce total token usage but to reduce "wasted tokens."

Efficiency First, Not Usage Suppression

Armstrong characterized this cost compression as a prerequisite for scaling AI adoption, not a limitation. He said engineers are still free to use any amount of tokens and any model, but the company has made usage data visible and tied usage to business impact—"spend more, we expect more impact."

He did not disclose specific absolute spending figures. But structurally, achieving nearly half the cost reduction while usage grows exponentially indicates that Coinbase has partially decoupled consumption from cost.

Armstrong concluded that this approach is universal and can be replicated by any enterprise to achieve sustainable AI scaling without making cost a ceiling.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
Get2SharesOfSKHynixAtZeroCost
1.65M Popularity
#
MicronOvertakesMetaInMarketValue
355.78K Popularity
#
WorldCup🇿🇦vs🇨🇦
129.41K Popularity
#
USMayPCEInflationRisesTo4.1%HighestIn3Years
603.99K Popularity
#
StakeUSD1Earn9.48%APR
1M Popularity

Pinned

Sitemap

American companies turn to Chinese AI models, Coinbase leads the way using GLM and Kimi.

Cost Advantage of Chinese Open-Source Models Takes Center Stage

Three-Layer Infrastructure Restructuring Drives Cost Reduction

Efficiency First, Not Usage Suppression

Trending Topics

Get2SharesOfSKHynixAtZeroCost

MicronOvertakesMetaInMarketValue

WorldCup🇿🇦vs🇨🇦

USMayPCEInflationRisesTo4.1%HighestIn3Years

StakeUSD1Earn9.48%APR

Pinned