CoinWorld News, Luo Fuli announced on the X platform the algorithm cost reduction mechanism after permanently lowering the API prices for the self-developed large model MiMo-v2.5 series. She revealed that after aligning the API prices with DeepSeek, Xiaomi's high-load inference engine can still break even, with cost reductions mainly coming from the hybrid attention architecture and hierarchical KV cache optimization. Aiming to reduce cache hit costs by 99%, Xiaomi's inference framework implemented hierarchical KV cache optimization for sliding window attention (SWA). Production testing shows that this hierarchical optimization increased the cache token capacity by 5 times and reduced cache costs by 80%. Luo Fuli stated that low-cost inference services are conducive to stimulating terminal intelligence demands. Large model enterprises should avoid blind price wars and, through bottom-layer collaboration design of algorithms and inference systems, keep actual operational expenses below the break-even point.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

10 Likes

Reward
10
5
2
Share

Comment

Add a comment

ASolitaryRockBeforeTheVolcano

· 12h ago

MiMo's price reduction this time is really aggressive, a 99% cost reduction sounds like science fiction, but SWA optimization does have some real substance.

View OriginalReply0

LendingRateAnxiety

· 12h ago

Hybrid attention plus hierarchical caching: after this combination, smaller companies face even greater pressure on inference cost.

View OriginalReply0

Pragmatists

· 12h ago

How is a 5x increase in cache capacity achieved? Are there papers on hierarchical KV caches? I want to read them in detail.

View OriginalReply0

InstantNoodlesWithContracts

· 12h ago

Algorithm and system layer collaboration to reduce costs is the right approach; just focusing on price cuts alone has no future. Luo Fuli sees this clearly.

View OriginalReply0

PocketValidator

· 13h ago

DeepSeek remains profitable after alignment, indicating that the initial pricing indeed left room for margins. Now it is considered to be returning to a reasonable level.

View OriginalReply0

Trending Topics
View More
#
StockTradingChallengeUpTo17000U
16.03M Popularity
#
GatePredictionMarketAddsSmartMoneyTracking
13.25M Popularity
#
USLaunchesNewStrikesOnIranOilRebounds
9.31M Popularity
#
TradeCFDWinGold
3.09M Popularity
#
DailyPolymarketHotspot
446.13K Popularity

Pinned

Sitemap

Lofree reveals MiMo's cost-reduction secret weapon: pre-filled attention reduces computational load to 10 layers at the global GQA level

Trending Topics

StockTradingChallengeUpTo17000U

GatePredictionMarketAddsSmartMoneyTracking

USLaunchesNewStrikesOnIranOilRebounds

TradeCFDWinGold

DailyPolymarketHotspot

Pinned