Gemini2.5 Version Released with Native Audio Functionality: AI Conversations Become More Natural

AIbase基地

Published inAI News · 6 min read · Jun 5, 2025

107

In recent development updates, Google released the Gemini2.5 version, marking significant progress in AI audio dialogue and generation technology. Gemini2.5 is a multimodal AI system capable of natively understanding and generating text, images, audio, video, and code, enhancing user interaction with AI.

The real-time audio dialogue function of Gemini2.5 makes human-machine communication more natural. Human conversations often involve tone, accent, and non-verbal sounds like laughter, all of which can be reflected through Gemini's audio generation technology. Its low latency ensures smooth and natural communication, allowing users to adjust the conversation style naturally, such as choosing different accents and tones, or even whispering during communication.

Real-Time Audio Dialogue

Human conversations are rich and nuanced, conveying meaning not only through spoken words but also through tone, accent, and non-verbal sounds like laughter. Gemini2.5 aims to achieve efficient and real-time communication through audio, including the following features for its audio dialogue function:

Natural Conversation: Provides high-quality voice interaction, showcasing appropriate expressiveness and rhythm, making the conversation smooth and natural with extremely low latency.
Style Control: Users can customize the tone, accent, and emotional expression of the conversation via natural language prompts, and even whisper during communication.
Tool Integration: During conversations, Gemini2.5 can call tools and functions to retrieve information from sources like Google Search in real time, enhancing the practicality of the conversation.
Dialogue Context Awareness: The system can identify and ignore background noise and irrelevant conversations, ensuring timely responses at appropriate moments.
Audio and Video Understanding: Supports real-time audio and video streams, enabling discussions about video content or screen-shared information with users.
Multi-Language Support: Supports over 24 languages and can flexibly switch languages within the same conversation.
Emotional Dialogue: Responds based on the user's tone, understanding the emotional differences in various expressions.
Advanced Thinking Dialogue: Enhances conversational coherence and intelligence by leveraging reasoning capabilities, particularly excelling in complex problem-solving scenarios.

Controllable Text-to-Speech Technology

Gemini2.5 has made breakthroughs in text-to-speech (TTS) technology, allowing users to generate natural voice outputs while exerting unprecedented control over the audio. Users can generate content ranging from short phrases to long narratives, precisely controlling style, tone, emotion, and performance, all of which can be adjusted through natural language prompts.

Dynamic Performance: Can read texts vividly, suitable for poetry recitation, news broadcasting, and storytelling, supporting specific emotions and accents.
Speed and Pronunciation Control: Users can control speech speed and ensure accurate pronunciation of specific words.
Multi-speaker Dialogue Generation: Can generate two-person dialogue audio based on textual input, making the content more engaging.
Multi-language Audio Generation: Easily generates multi-language audio content, supporting over 24 languages.

During the development of Gemini2.5, Google conducted a comprehensive assessment of potential risks and implemented corresponding mitigation strategies. All audio outputs are embedded with a watermarking technology called SynthID to ensure transparency and recognizability of AI-generated audio.

Gemini2.5 provides developers with rich native audio functionalities, allowing them to build more interactive applications through Google AI Studio or Gemini APIs in Vertex AI. Developers can test the native audio dialogue of Gemini2.5 Flash Preview in the streaming tab of Google AI Studio or choose controllable text-to-speech generation, driving audio innovation in applications such as announcements, stories, podcasts, and video games.

Baidu Qianfan Coding Plan Stops Subscription Renewal, Will Upgrade to Token Plan with Pay-as-you-go Pricing in July

Baidu Intelligent Cloud Qianfan Large Model Platform will terminate the renewal of the "Coding Plan" AI code subscription service on June 25, 2026. The product was launched only four months ago. It once mainly featured multi-model driving, supporting one-click switching to mainstream models such as GLM-4.7, and deep compatibility with programming tools like Cursor. This suspension of renewal marks a major strategic transformation.

DingTalk Wukong Receives the World's First International Certification for AI Management System, Marking a New Standardized Stage in AI Governance

DingTalk's enterprise AI agent platform “Wukong” has achieved ISO/IEC 42001:2023 certification, the world's first international standard for AI management systems. This milestone establishes an authoritative benchmark in AI security and compliance, guiding enterprise intelligent transformation.....

AI Shopping Assistant Accuracy Only 16%: Shanghai Consumer Protection Commission Directly Points to E-commerce Algorithms: Don't Treat Consumers as Targets

A Shanghai Consumer Council survey of 4,308 respondents found 84.56% have tried AI shopping tools, but only 16.06% trust their accuracy. Key pain points: AI often pushes high-priced items, with 38.79% saying it ignores low-price needs and 29.71% citing mixed-price results and high manual filtering costs.....

AI Daily: Apple Xcode 26.6 Released Officially; Meituan Store Implements AI Actions in Beijing; OpenAI Releases GPT-5.0 with Restrictions

Welcome to the [AI Daily] section! Here is your guide to exploring the world of artificial intelligence every day. Every day, we present you with the latest content in the AI field, focusing on developers, helping you understand technology trends and learn about innovative AI product applications. Click to learn more about new AI products: https://app.aibase.com/zh1. Apple Xcode 26.6 has been officially released, bringing a powerful ally - Google Gemini for programming assistance. Apple has officially released the Xcode 26.6 update, adding support for Google Gem

Meituan Small Store AI Initiative Launched in Beijing, Millions of Small and Medium-Sized Catering Businesses Receive New Digital Operation Support

The 'Meituan Small Store AI' initiative held its first offline experience in Beijing, marking the entry of empowering small and medium-sized merchants into the promotion phase of digital upgrades. A special area has been launched, offering free AI operation courses to millions of catering small stores across the country. To address pain points such as high operating costs and difficulty in decision-making for small catering businesses, Meituan has introduced an AI tool matrix, including the 'AI Xiaoguangan', to help reduce costs and improve efficiency.

5-Second Video Consumes the Power of 10 Mobile Phones, AI Industry is Devouring Global Water and Electricity Resources

The AI industry's rapid expansion is intensifying global pressure on energy and water resources. A 5-second AI-generated HD video can charge 10 smartphones, while 10 queries consume about a bottle of mineral water. Amid volatile global energy markets and supply strains, surging AI computing demand exacerbates the crisis. AI's high energy and water consumption is emerging as a key challenge for future energy system planning.....

Production Expansion Limited, High Demand, AI Storage Shortage Period May Continue Until End of Next Year

BofA analysts note AI is spurring a structural shift in memory chips. Supply-demand imbalance for AI-specific memory will persist at least until end-2027. Its capacity needs are 3-4x traditional chips; 'no memory, no AI.' Surging profits for memory makers reflect an AI-driven fundamental transformation, not a cyclical upturn.....

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Ranking Monitor

AI Conversation Insight

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Ranking Optimization

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

LLM API Proxy Checker

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator

Gemini2.5 Version Released with Native Audio Functionality: AI Conversations Become More Natural

AIbase基地

Real-Time Audio Dialogue

Controllable Text-to-Speech Technology

This article is from AIbase Daily

AI News Recommendations

Baidu Qianfan Coding Plan Stops Subscription Renewal, Will Upgrade to Token Plan with Pay-as-you-go Pricing in July

DingTalk Wukong Receives the World's First International Certification for AI Management System, Marking a New Standardized Stage in AI Governance

AI Shopping Assistant Accuracy Only 16%: Shanghai Consumer Protection Commission Directly Points to E-commerce Algorithms: Don't Treat Consumers as Targets

AI Daily: Apple Xcode 26.6 Released Officially; Meituan Store Implements AI Actions in Beijing; OpenAI Releases GPT-5.0 with Restrictions

Meituan Small Store AI Initiative Launched in Beijing, Millions of Small and Medium-Sized Catering Businesses Receive New Digital Operation Support

5-Second Video Consumes the Power of 10 Mobile Phones, AI Industry is Devouring Global Water and Electricity Resources

Production Expansion Limited, High Demand, AI Storage Shortage Period May Continue Until End of Next Year

OpenAI Codex Individual User Usage Surges 137 Times, AI Programming Has Gone Beyond Programmers

Apple Xcode 26.6 Officially Released, Programming Assistant Welcomes Powerful Support from Google Gemini

French AI startup Mistral AI launches OCR4 model: supports 170 languages, more human-like interaction experience

AI News Recommendations

Baidu Qianfan Coding Plan Stops Subscription Renewal, Will Upgrade to Token Plan with Pay-as-you-go Pricing in July

DingTalk Wukong Receives the World's First International Certification for AI Management System, Marking a New Standardized Stage in AI Governance

AI Shopping Assistant Accuracy Only 16%: Shanghai Consumer Protection Commission Directly Points to E-commerce Algorithms: Don't Treat Consumers as Targets

AI Daily: Apple Xcode 26.6 Released Officially; Meituan Store Implements AI Actions in Beijing; OpenAI Releases GPT-5.0 with Restrictions

Meituan Small Store AI Initiative Launched in Beijing, Millions of Small and Medium-Sized Catering Businesses Receive New Digital Operation Support

5-Second Video Consumes the Power of 10 Mobile Phones, AI Industry is Devouring Global Water and Electricity Resources

Production Expansion Limited, High Demand, AI Storage Shortage Period May Continue Until End of Next Year

OpenAI Codex Individual User Usage Surges 137 Times, AI Programming Has Gone Beyond Programmers

Apple Xcode 26.6 Officially Released, Programming Assistant Welcomes Powerful Support from Google Gemini

French AI startup Mistral AI launches OCR4 model: supports 170 languages, more human-like interaction experience