Gemini TTS 2.5 Officially Launched: Google Introduces Expressive Speech with 24 Languages and One-Click Multi-Character Switching

AIbase基地

Published inAI News · 4 min read · Dec 11, 2025

131

Google has released the Gemini 2.5 Flash and Pro text-to-speech preview models, fully replacing the previous system from May this year. The new model focuses on "emotional expression," context-adaptive rhythm, and 24 language multi-character dialogue. Developers can now test for free on Google AI Studio and Playground, with an expected release into production environments in Q1 2025.

Emotional Expression: Switch from "Happy and Optimistic" to "Gloomy and Serious" with one click

- Style Response: Adjust voice and speed instantly based on prompts like "Happy and Optimistic" or "Gloomy and Serious"

- Use Cases: Audiobooks, game NPCs, localized courseware, avoiding the mechanical feel of traditional TTS

- Demo: The Synergy Intro app allows real-time experience of multi-style switching, outputting professional voice acting immediately

Rhythm Adaptation: Context-aware speed, making storytelling more vivid

- Mechanism: Automatically slow down complex explanations, accelerate exciting sections, supporting dynamic changes like "slow and suspenseful → fast and thrilling"

- Example: Reading a mystery novel can gradually speed up as the plot progresses, with a "click" sound at the turning point to release tension

- Applicable: Product tutorials, marketing videos, bidding farewell to monotonous reading

Multi-character + 24 Languages: Consistent across languages, characters not mixed up

- Function: Lock multiple speakers' identities, enabling natural conversation transitions

- Language: Covers 24 languages including English, French, German, Japanese, Hindi, preserving original pitch and style

- Demo: The Voices from History app enables mixing English with other languages for historical dialogues, keeping character personalities stable

Industry Feedback: Subscription rate increased by 20%, cost reduced by 20%

- Audio platforms: After integration, the multi-speaker mode is popular, subscription rate increased by 20%, first-month attrition rate decreased by 20%, operational costs reduced by 20%

- Content studios: English/Indian comic voice acting character consistency received praise, significantly enhancing immersion

- Platform plan: In Q1 2025, a low-latency Flash version and a high-quality Pro version will be launched simultaneously, meeting both real-time and premium demands

Next Steps: Dual lines of low-latency Flash and premium Pro

Google stated that in Q1 2025, it will optimize both the low-latency Flash version (<300ms first package) and the high-quality Pro version (48kHz sampling) in parallel, and open up edge node deployment, aiming to penetrate real-time scenarios such as podcasts, interactive games, and virtual anchors. AIbase will continue to track its edge node deployment and payment model updates.

Official website: https://x.com/GoogleAIStudio/status/1998876411734692107

Gemini2.5Flash GoogleAIStudio Expressive Expression Multi-Character Dialogue

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Google Upgrades Gemini API File Search: Multimodal RAG Capabilities Achieve Comprehensive Advancement

Google announced an upgrade to the file search feature of the Gemini API, breaking through text retrieval limitations based on the Gemini Embedding2 model, integrating images and complex documents, enhancing multimodal RAG capabilities, and marking a critical step forward for enterprise-level AI information retrieval accuracy.

May 11, 2026

160

China Mobile Launches the World's First AI-eSIM Multi-Ecosystem Intelligent Service System

China Mobile launched the '1+3+9' multi-ecosystem intelligent service system at the AI-eSIM forum during the Mobile Cloud Conference, with the theme 'AI-eSIM: Pioneering a New Era'. The system aims to provide more secure, low-cost, and flexible intelligent connectivity and AI services. Deputy General Manager Chen Huida stated that this move responds to the country's 'Artificial Intelligence+' strategy, using AI-eSIM to build a multi-ecosystem service model, promoting intelligent upgrades for To-end users.

May 11, 2026

110

Step Audio Model Ranks Among the Top Three Globally, Setting a New High for Chinese Large Models in Speech Perception

The speech generation model StepAudio2.5TTS from Chinese company StepXingchen has entered the top three globally in the Artificial Analysis Speech Arena Leaderboard, becoming the highest-ranked Chinese large model product on the list. The ranking uses a blind-test Elo scoring system, where users evaluate speech perception without knowing the model's identity, highlighting its genuine speech synthesis capabilities.

May 11, 2026

130

Baidu Releases Wenxin Large Model 5.1: Search Capabilities Rank First in the Country, Pre-training Cost is Only 6% of Industry Standards

Baidu launches the new generation Ernie Bot 5.1, featuring 'multi-dimensional elastic pre-training' technology, emphasizing high cost-effectiveness and strong search integration. Now available on Baidu Qianfan Model Plaza and Ernie Bot official website, it offers open experience for enterprise users and developers, showcasing a breakthrough in domestic large models amid AI competition.....

May 9, 2026

890

StepZen Launches StepAudio 2.5 Realtime, Real-Time Speech AI Upgraded!

StepAudio 2.5 Realtime, a new real-time voice model by StepStar, is now fully online for developers. It enhances paralinguistic perception, character customization, and dialogue capabilities for more realistic interactions, innovatively processing tone, speed, pauses, and sighs to boost conversational naturalness.....

May 9, 2026

210

Step Audio 2.5 Realtime Launch by StepZen: Granting Large Models Human-like Emotions and Intelligence

Step Audio, developed by Step Star, releases the next-generation real-time voice model StepAudio 2.5 Realtime, marking a qualitative shift from text-based dialogue to real-time emotional interaction. This model significantly enhances naturalness and intelligence in voice interaction, delivering a human-like deep perception experience, advancing domestic large models in the voice domain.....

May 9, 2026

240

Breaking Tradition! The Small-Scale Inference Engine DeepSeek V4 Flash is Released

DeepSeek V4Flash is a compact inference engine optimized for the Metal platform, delivering efficient and flexible local inference by tailoring execution for DeepSeek V4Flash models. Its advantages include speed enhancements and a unique thinking mode design, distinguishing it from general engines to maximize performance.....

May 8, 2026

280

SenseNova 6.7 Flash-Lite by SenseTime Reduces Consumption by 60%

SenseTime released SenseNova6.7-Lite, a lightweight multimodal agent model designed for real-world streaming needs. It uses a native multimodal architecture to directly understand complex layouts, document structures, and financial charts, enabling integrated 'see, think, act' capabilities. This enhances tasks like data analysis, deep research, and PPT generation. Technically, it eliminates visual intermediate layers, achieving significant agent ....

May 8, 2026

330

Step Astronomy to Complete $2.5 Billion Financing and Move Towards Hong Kong IPO

Chinese AI startup Stepfun plans to close a $2.5 billion funding round after dismantling its VIE structure, with backing from industry players including Huaqin, Longcheer, OmniVision, and ZTE across manufacturing and core components. This move reflects the trend of AI models shifting to edge devices, boosting investor confidence. The company completed its shareholding reform in April, transitioning from a limited liability to a joint-stock compan....

May 8, 2026

180

SenseTime Launches SenseNova 6.7 Flash-Lite, Achieving a Leapforward Upgrade in Multimodal Capabilities

SenseTime released the lightweight multimodal agent model 'SenseNova 6.7 Flash-Lite,' shifting AI competition from large parameters to efficiency. Designed for real-world workflows, it breaks traditional agent bottlenecks in complex long-chain tasks, excelling in visual understanding and logical decision-making without relying on visual-to-text intermediates.....

May 8, 2026

350

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

AI Conversation Insight

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Ranking Optimization

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator

Gemini TTS 2.5 Officially Launched: Google Introduces Expressive Speech with 24 Languages and One-Click Multi-Character Switching

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Google Upgrades Gemini API File Search: Multimodal RAG Capabilities Achieve Comprehensive Advancement

China Mobile Launches the World's First AI-eSIM Multi-Ecosystem Intelligent Service System

Step Audio Model Ranks Among the Top Three Globally, Setting a New High for Chinese Large Models in Speech Perception

Baidu Releases Wenxin Large Model 5.1: Search Capabilities Rank First in the Country, Pre-training Cost is Only 6% of Industry Standards

StepZen Launches StepAudio 2.5 Realtime, Real-Time Speech AI Upgraded!

Step Audio 2.5 Realtime Launch by StepZen: Granting Large Models Human-like Emotions and Intelligence

Breaking Tradition! The Small-Scale Inference Engine DeepSeek V4 Flash is Released

SenseNova 6.7 Flash-Lite by SenseTime Reduces Consumption by 60%

Step Astronomy to Complete $2.5 Billion Financing and Move Towards Hong Kong IPO

SenseTime Launches SenseNova 6.7 Flash-Lite, Achieving a Leapforward Upgrade in Multimodal Capabilities

AI News Recommendations

Google Upgrades Gemini API File Search: Multimodal RAG Capabilities Achieve Comprehensive Advancement

China Mobile Launches the World's First AI-eSIM Multi-Ecosystem Intelligent Service System

Step Audio Model Ranks Among the Top Three Globally, Setting a New High for Chinese Large Models in Speech Perception

Baidu Releases Wenxin Large Model 5.1: Search Capabilities Rank First in the Country, Pre-training Cost is Only 6% of Industry Standards

StepZen Launches StepAudio 2.5 Realtime, Real-Time Speech AI Upgraded!

Step Audio 2.5 Realtime Launch by StepZen: Granting Large Models Human-like Emotions and Intelligence

Breaking Tradition! The Small-Scale Inference Engine DeepSeek V4 Flash is Released

SenseNova 6.7 Flash-Lite by SenseTime Reduces Consumption by 60%

Step Astronomy to Complete $2.5 Billion Financing and Move Towards Hong Kong IPO

SenseTime Launches SenseNova 6.7 Flash-Lite, Achieving a Leapforward Upgrade in Multimodal Capabilities