Alibaba's this wave of implicit caching directly cuts costs by 80%, and the long-context Agent's money-burning nightmare finally has a solution.

View Original
CoinNetwork
CoinWorld News: The Alibaba Qwen team announced that on the Alibaba Cloud Bianlian platform, their flagship model Qwen3.7-max will have automatic implicit caching enabled by default. Developers do not need to modify any code or specify additional parameters to directly enjoy cost savings from caching.

Under the new billing mechanism, the system will automatically identify and extract repeated context prefixes from requests. Once a cache hit occurs, the input token cost for the hit portion will be charged at only 20% of the original unit price, directly eliminating 80% of the input costs.

Implicit caching is aimed at the massive overhead in long-text scenarios and agent intelligent agent scenarios. With a 1 million tokens long-context window, Qwen3.7-max performing advanced tasks such as autonomous coding often requires high-frequency, repeated reading of large codebases or knowledge documents.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned