US tech companies quietly shift to Chinese AI models, Coinbase leads the way using GLM and Kimi

robot
Abstract generation in progress

American tech companies are quietly integrating Chinese open-source AI models into their production infrastructure. As the cost of top-tier American model services continues to rise, enterprises like Coinbase are adopting Chinese open-source models as their default option, significantly reducing AI expenses without suppressing usage.

Coinbase CEO Brian Armstrong posted on X platform late Friday night, announcing that the company has set GLM 5.2 from Zhipu AI and Kimi 2.7 from Beijing Moonshot AI as default models for engineers through its internal LLM gateway. Armstrong stated that with routing optimization and caching improvements, Coinbase has cut AI costs "by nearly half," while token usage continues to grow at an exponential rate.

Cost advantages of Chinese open-source models take center stage

In his post, Armstrong noted that 91% of engineers never hit their original usage limits, so instead of lowering the cap or adding usage alerts, Coinbase chose to switch to "cheaper default models."

GLM 5.2 comes from Zhipu AI, and Kimi 2.7 from Beijing Moonshot AI, both of which are open-weight models. Armstrong said these models are deployed for routine tasks, while engineers can still use frontier models for tasks requiring complex planning. His logic: using top-tier models for execution-level tasks is often "overkill."

For code review, the company adopts a multi-model parallel strategy, where different models cross-check each other's outputs to maintain quality standards.

Three-tier infrastructure overhaul drives cost reduction

Armstrong outlined three core measures.

First is intelligent routing: In a custom scheduling framework, the system preprocesses prompts, combines cache hit rates with model pricing, and automatically distributes tasks to the most suitable and cost-effective model. He said the ultimate goal is for AI, rather than humans, to perform model selection.

Second is aggressive caching: Coinbase requires all requests to be cache-aware, reusing existing caches as much as possible. For example, with LibreChat, after implementing proper caching mechanisms, the cache hit rate jumped from 5% to 60%.

Third is streamlined context: Armstrong advises starting new sessions when switching tasks, narrowing file context ranges, and disconnecting unused tools. He emphasized that the goal is not to reduce total token usage, but to reduce "wasted tokens."

Efficiency first, not usage suppression

Armstrong characterized this cost compression as a prerequisite for expanding AI adoption, not a restriction. He said engineers are still free to use any number of tokens and any model, but the company has made usage data visible and linked usage to business impact—"the more you spend, the greater the impact we expect."

He did not disclose specific absolute spending figures. But structurally, achieving nearly halved costs while usage grows exponentially suggests Coinbase has, to some extent, decoupled consumption from cost.

Armstrong concluded that this methodology is universal and can be adopted by any enterprise to sustainably scale AI usage without making cost a ceiling.

Risk Disclaimer

        Market risks exist, and investment requires caution. This article does not constitute personal investment advice, nor does it consider the specific investment objectives, financial situation, or needs of individual users. Users should consider whether any opinions, viewpoints, or conclusions in this article are suitable for their particular circumstances. Any investment based on this is at your own risk.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments