In the fierce competition of generative AI, computing power and call costs have always been the "lifeblood" that developers care about most. Recently, Google has released a major benefit in the developer ecosystem: the free quota for some accounts has been significantly increased for the Gemini API, and the single-minute Token processing limit (TPM) for some models has officially reached 1 million.

According to test feedback, this adjustment mainly covers the Gemini 2.5 series models. Among them, the lightweight models Gemini 2.5 Flash and Flash-Lite have already achieved an ultra-high throughput of 1 million Tokens per minute in some accounts. More attractively, this free tier still maintains an extremely low threshold with "no need to bind a card and no limit on total volume," providing personal developers and startup teams with a highly competitive low-cost trial space.

image.png

However, Google's recent strategic expansion shows clear "differentiation." Not all users can enjoy this top-level quota, and performance restrictions between different models still exist. Currently, although the Token processing limit has been significantly relaxed, the request frequency limit (RPM) for each model is still controlled between 15 to 30 requests per minute, and the daily request total (RPD) is locked at 1,500 requests. Additionally, as the high-end option in this series, the Pro version model is not yet included in the free access list.

For developers concerned about privacy, it is worth noting that Google explicitly states in the service terms that it has the right to use prompts (Prompts) and feedback content under the free tier for model training. To address this potential data compliance issue, developers can check their account's specific quota details through the official query page and assess whether to upgrade to a paid version based on the sensitivity of their business.

Industry professionals believe that Google's move is not only to attract developers to migrate to its API ecosystem through high-spec free quotas but also to maintain its leading position in the inference service market by offering extreme cost-effectiveness amid the impact of open-source models. As this free strategy continues to expand, the barriers for individual developers to build complex AI applications are expected to be further reduced.