Google updates Gemini API pricing, charging based on inference usage tiers

Google has recently updated the billing tiers for the Gemini API. The optimization approach and pricing for this update are both based on real inference usage requirements. The newly added inference service tiers include: Standard(Standard), Flexible(Flex), Priority(Priority), Batch(Batch), and Caching(Caching).

Among them, the Flexible inference tier leverages idle computing resources during off-peak hours to provide a 50% discount on the standard price. The target latency is 1 to 15 minutes, but it does not provide a latency guarantee. The Batch API tier also offers a 50% discount on the standard rate, with a maximum latency of up to 24 hours. The Caching tier is charged based on the number of cached tokens(Token) and the duration of storage. It is recommended for scenarios such as dialogue robots running complex system instructions, repeated analysis of long video files, and querying large-scale document sets. The Priority tier is priced 75% to 100% higher than the standard price, with latency controllable from the millisecond to the second level. Google recommends this tier for scenarios such as real-time customer service chatbots, real-time fraud detection, and business-critical intelligent assistants.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin