Futures
Access hundreds of perpetual contracts
CFD
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Pre-IPOs
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Promotions
AI
Gate AI
Your all-in-one conversational AI partner
Gate AI Bot
Use Gate AI directly in your social App
GateClaw
Gate Blue Lobster, ready to go
Gate for AI Agent
AI infrastructure, Gate MCP, Skills, and CLI
Gate Skills Hub
10K+ Skills
From office tasks to trading, the all-in-one skill hub makes AI even more useful.
GateRouter
Smartly choose from 40+ AI models, with 0% extra fees
Google Gemini API sparks “cache billing vulnerability”: developers deleting invalid data get hit with a hefty 20,000 reais
Google AI Developer Forum recently revealed a serious API billing anomaly disaster. A developer posted a plea for help, pointing out that the Gemini 3 Flash text cache (Context Caching) feature they used continued to incur charges at an astonishing rate of over a thousand yuan per hour after deleting and clearing via the API on the frontend. In just a few days, the accumulated bill approached 20k Brazilian Reais (roughly several thousand USD). The developer has now been forced to fully disable the Gemini API service to stop the bleeding, and this incident has sparked significant concern within the developer community.
(Background summary: Trump calls for investment in American AI companies, possibly negotiating with OpenAI, Anthropic, xAI this week; Altman proposes a "public wealth fund" concept)
(Additional background: Before SpaceX IPO, a large order: Google pays $920 million monthly to rent 110k NVIDIA GPUs)
Table of Contents
Toggle
The hidden costs of AI large model APIs have always been a major concern for developers, but recently Google’s latest Gemini API has exposed a shocking "ghost billing" vulnerability. On the Google AI Developer Forum, a post titled "Urgent: Huge Cache Cost Increase Issue (Part 2)" revealed that the Gemini 3 Flash cache service (Context Caching) appears to have a serious runaway issue in its backend billing mechanism.
Cache deletion still continues to incur charges! Thousands per hour
According to detailed BigQuery billing data provided by developer Danilo_Oliveira, this abnormal event started on June 3, 2026. Initially, the cost for Gemini 3 Flash's "cache text storage token hours (SKU ID: 583D-5DB6-4555)" remained around 20 to 30 BRL per hour, with usage about 4 million token hours.
However, by June 6, the situation rapidly worsened, with costs exploding exponentially. A single hour’s usage exceeded 200 million token hours, with hourly charges surpassing 1,000 BRL. By early June 7, a total of 341 abnormal billing incidents had caused the total bill to soar to 17,847.21 BRL, indicating the billing system had completely lost control.
Emergency shutdown of API to stop bleeding
Faced with the skyrocketing bills, the developer took all possible precautions. They immediately shut down scripts generating the cache, and used Google’s official REST API to verify that the cache list on the frontend had been "completely cleared." Yet, the heartbreaking part was that even after the frontend showed no cache remaining, the backend system continued to deduct charges uncontrollably.
Suspecting a bug caused by Google’s backend servers failing to properly clear cache records, the developer urgently opened billing issue ticket #720261 to seek official assistance. To prevent the financial black hole from expanding, they ultimately had to take the drastic step of completely disabling the entire Gemini API service within their Google Cloud project.
Developer community panic, caution needed when using cache features
After this incident was exposed on the forum, it quickly drew attention and discussion among industry peers. Since the cache function (Context Caching) was originally intended to address the cost and latency issues of processing ultra-long texts with large language models (LLMs), it has now become a black hole devouring funds. This has undoubtedly dealt a cold shower to enterprises and individual developers preparing for large-scale adoption of Gemini API.
Until Google officially fixes and publicly explains this backend vulnerability, the community strongly recommends developers currently using the Gemini API cache feature to closely monitor their Google Cloud real-time billing, and set strict budget caps and alert mechanisms to avoid waking up to an unaffordable bill.