Google has released the Gemini 2.5 Flash and Pro text-to-speech preview models, fully replacing the previous system from May this year. The new model focuses on "emotional expression," context-adaptive rhythm, and 24 language multi-character dialogue. Developers can now test for free on Google AI Studio and Playground, with an expected release into production environments in Q1 2025.

image.png

Emotional Expression: Switch from "Happy and Optimistic" to "Gloomy and Serious" with one click  

- Style Response: Adjust voice and speed instantly based on prompts like "Happy and Optimistic" or "Gloomy and Serious"  

- Use Cases: Audiobooks, game NPCs, localized courseware, avoiding the mechanical feel of traditional TTS  

- Demo: The Synergy Intro app allows real-time experience of multi-style switching, outputting professional voice acting immediately

Rhythm Adaptation: Context-aware speed, making storytelling more vivid  

- Mechanism: Automatically slow down complex explanations, accelerate exciting sections, supporting dynamic changes like "slow and suspenseful → fast and thrilling"  

- Example: Reading a mystery novel can gradually speed up as the plot progresses, with a "click" sound at the turning point to release tension  

- Applicable: Product tutorials, marketing videos, bidding farewell to monotonous reading

Multi-character + 24 Languages: Consistent across languages, characters not mixed up  

- Function: Lock multiple speakers' identities, enabling natural conversation transitions  

- Language: Covers 24 languages including English, French, German, Japanese, Hindi, preserving original pitch and style  

- Demo: The Voices from History app enables mixing English with other languages for historical dialogues, keeping character personalities stable

Industry Feedback: Subscription rate increased by 20%, cost reduced by 20%  

- Audio platforms: After integration, the multi-speaker mode is popular, subscription rate increased by 20%, first-month attrition rate decreased by 20%, operational costs reduced by 20%  

- Content studios: English/Indian comic voice acting character consistency received praise, significantly enhancing immersion  

- Platform plan: In Q1 2025, a low-latency Flash version and a high-quality Pro version will be launched simultaneously, meeting both real-time and premium demands

Next Steps: Dual lines of low-latency Flash and premium Pro  

Google stated that in Q1 2025, it will optimize both the low-latency Flash version (<300ms first package) and the high-quality Pro version (48kHz sampling) in parallel, and open up edge node deployment, aiming to penetrate real-time scenarios such as podcasts, interactive games, and virtual anchors. AIbase will continue to track its edge node deployment and payment model updates.

Official website: https://x.com/GoogleAIStudio/status/1998876411734692107