Google has recently updated the billing structure of its Gemini API to better meet users' inference needs. This update introduces several new service tiers, including Standard, Flexible, Priority, Batch, and Cache. Users can choose the most suitable tier based on their specific needs.

First, the Standard tier provides basic inference services, which users can select according to their usage. The Flexible tier is an innovative option that utilizes idle computing resources during off-peak hours, offering a 50% discount compared to standard pricing. The target latency for this tier ranges from 1 to 15 minutes but does not guarantee a fixed latency, making it suitable for applications with less strict time requirements.

Additionally, the Batch tier also offers a 50% discount compared to standard pricing, making it ideal for users who need to process large amounts of data, with a maximum latency of up to 24 hours. This tier is particularly suitable for large-scale data processing scenarios, allowing users to significantly reduce costs when performing large-scale information queries.

Regarding the Cache tier, billing is based on the number of cached tokens and storage duration, making it especially suitable for chatbots that frequently call complex instructions, long video analysis, or queries involving large document sets. This tier allows users to effectively manage storage and computing resources, improving system efficiency.

The Priority tier is priced 75% to 100% higher than the standard price but can control latency at millisecond to second level. This tier is ideal for applications requiring real-time responses, such as customer service chatbots, real-time fraud detection, and critical business intelligent assistants. Google recommends users with such needs to choose the Priority tier to ensure optimal performance in terms of response speed and efficiency for their applications.

Key Points:   

🌟 New Gemini API service tiers have been added to meet different user needs.  

⏳ Flexible and Batch tiers offer a 50% discount, suitable for large-scale data processing.  

⚡ Priority tier ensures millisecond-level response, suitable for real-time application scenarios.