Google(GOOGL.US) updates Gemini API pricing, billing based on inference usage tiers

LootboxPhobia · 2026-04-03T06:55:12+00:00

Google has updated the billing tiers for the Gemini API, including Standard, Elastic, Priority, Batch, and Cache versions, aimed at optimizing the operational costs and efficiency of inference services. The Elastic and Batch tiers offer a 50% discount but have higher latency; the Priority tier provides faster responses, suitable for real-time applications.

LootboxPhobia

2026-04-03 06:55:12

Abstract generation in progress

Zhitong Finance APP learned that Google (GOOGL.US) has recently updated the billing tiers for the Gemini API, and both the optimization plan and pricing are formulated based on actual inference usage needs.

The newly added inference service tiers include: Standard, Flex, Priority, Batch, and Caching.

Google said: “The Gemini API provides various optimization mechanisms that can achieve a balance among runtime speed, usage costs, and service stability based on specific business workload needs. Whether you are building a real-time dialogue robot or running large-scale offline data processing workflows, choosing the right operating mode can significantly reduce costs or improve operating efficiency.”

Among them, the Flex inference tier leverages unused compute resources during off-peak hours to offer a 50% discount on the standard price. The target latency is 1 to 15 minutes, but no latency guarantee is provided. The Batch API tier also offers a 50% discount on the standard rate, with a maximum latency of up to 24 hours.

The Caching tier is billed based on the number of cached tokens and the storage duration. It is recommended for scenarios such as dialogue robots with complex system instructions, repeated analysis of long video files, and large-scale document set queries.

The Priority tier is priced 75% to 100% higher than the Standard price, and latency can be controlled at the millisecond- to second-level. Google recommends this tier for scenarios such as real-time customer service chatbots, real-time fraud detection, and business-critical intelligent assistants.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

1 Likes