Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Google(GOOGL.US) updates Gemini API pricing, billing based on inference usage tiers
Zhitong Finance APP learned that Google (GOOGL.US) has recently updated the billing tiers for the Gemini API, and both the optimization plan and pricing are formulated based on actual inference usage needs.
The newly added inference service tiers include: Standard, Flex, Priority, Batch, and Caching.
Google said: “The Gemini API provides various optimization mechanisms that can achieve a balance among runtime speed, usage costs, and service stability based on specific business workload needs. Whether you are building a real-time dialogue robot or running large-scale offline data processing workflows, choosing the right operating mode can significantly reduce costs or improve operating efficiency.”
Among them, the Flex inference tier leverages unused compute resources during off-peak hours to offer a 50% discount on the standard price. The target latency is 1 to 15 minutes, but no latency guarantee is provided. The Batch API tier also offers a 50% discount on the standard rate, with a maximum latency of up to 24 hours.
The Caching tier is billed based on the number of cached tokens and the storage duration. It is recommended for scenarios such as dialogue robots with complex system instructions, repeated analysis of long video files, and large-scale document set queries.
The Priority tier is priced 75% to 100% higher than the Standard price, and latency can be controlled at the millisecond- to second-level. Google recommends this tier for scenarios such as real-time customer service chatbots, real-time fraud detection, and business-critical intelligent assistants.