American tech companies are quietly integrating Chinese open-source AI models into their production infrastructure. As the cost of top-tier American model services continues to rise, enterprises like Coinbase are adopting Chinese open-source models as their default option, significantly reducing AI expenses without suppressing usage.

Coinbase CEO Brian Armstrong posted on X platform late Friday night, announcing that the company has set GLM 5.2 from Zhipu AI and Kimi 2.7 from Beijing Moonshot AI as default models for engineers through its internal LLM gateway. Armstrong stated that with routing optimization and caching improvements, Coinbase has cut AI costs "by nearly half," while token usage continues to grow at an exponential rate.

Cost advantages of Chinese open-source models take center stage

In his post, Armstrong noted that 91% of engineers never hit their original usage limits, so instead of lowering the cap or adding usage alerts, Coinbase chose to switch to "cheaper default models."

GLM 5.2 comes from Zhipu AI, and Kimi 2.7 from Beijing Moonshot AI, both of which are open-weight models. Armstrong said these models are deployed for routine tasks, while engineers can still use frontier models for tasks requiring complex planning. His logic: using top-tier models for execution-level tasks is often "overkill."

For code review, the company adopts a multi-model parallel strategy, where different models cross-check each other's outputs to maintain quality standards.

Three-tier infrastructure overhaul drives cost reduction

Armstrong outlined three core measures.

First is intelligent routing: In a custom scheduling framework, the system preprocesses prompts, combines cache hit rates with model pricing, and automatically distributes tasks to the most suitable and cost-effective model. He said the ultimate goal is for AI, rather than humans, to perform model selection.

Second is aggressive caching: Coinbase requires all requests to be cache-aware, reusing existing caches as much as possible. For example, with LibreChat, after implementing proper caching mechanisms, the cache hit rate jumped from 5% to 60%.

Third is streamlined context: Armstrong advises starting new sessions when switching tasks, narrowing file context ranges, and disconnecting unused tools. He emphasized that the goal is not to reduce total token usage, but to reduce "wasted tokens."

Efficiency first, not usage suppression

Armstrong characterized this cost compression as a prerequisite for expanding AI adoption, not a restriction. He said engineers are still free to use any number of tokens and any model, but the company has made usage data visible and linked usage to business impact—"the more you spend, the greater the impact we expect."

He did not disclose specific absolute spending figures. But structurally, achieving nearly halved costs while usage grows exponentially suggests Coinbase has, to some extent, decoupled consumption from cost.

Armstrong concluded that this methodology is universal and can be adopted by any enterprise to sustainably scale AI usage without making cost a ceiling.

Risk Disclaimer

        Market risks exist, and investment requires caution. This article does not constitute personal investment advice, nor does it consider the specific investment objectives, financial situation, or needs of individual users. Users should consider whether any opinions, viewpoints, or conclusions in this article are suitable for their particular circumstances. Any investment based on this is at your own risk.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
Get2SharesOfSKHynixAtZeroCost
1.66M Popularity
#
MicronOvertakesMetaInMarketValue
358.9K Popularity
#
WorldCup🇿🇦vs🇨🇦
129.76K Popularity
#
USMayPCEInflationRisesTo4.1%HighestIn3Years
605.97K Popularity
#
StakeUSD1Earn9.48%APR
1M Popularity

Pinned

Sitemap

US tech companies quietly shift to Chinese AI models, Coinbase leads the way using GLM and Kimi

Cost advantages of Chinese open-source models take center stage

Three-tier infrastructure overhaul drives cost reduction

Efficiency first, not usage suppression

Trending Topics

Get2SharesOfSKHynixAtZeroCost

MicronOvertakesMetaInMarketValue

WorldCup🇿🇦vs🇨🇦

USMayPCEInflationRisesTo4.1%HighestIn3Years

StakeUSD1Earn9.48%APR

Pinned