B Station Open-Source Text-to-Speech Model IndexTTS-2.0 with Controllable Emotion and Duration

AIbase基地

Published inAI News · 4 min read · Sep 11, 2025

Recently, the Index team of Bilibili (B站) announced the full open-source release of its self-developed text-to-speech (TTS) system - IndexTTS-2.0. This system features controllable emotions and adjustable duration, marking an important step forward in the practical application of zero-shot TTS technology.

In the field of speech synthesis, controlling duration and expressing emotion have always been technical challenges in the industry. To overcome these issues, IndexTTS-2.0 introduced two core innovations: first, the time encoding mechanism. This mechanism was first applied in the autoregressive TTS architecture, greatly improving the accuracy of speech duration control, making the generated speech more stable and natural, and allowing precise control over the rhythm of speech. Second, the disentangled modeling of voice and emotion. The system uses an innovative disentangled modeling approach, allowing users to choose from various emotional adjustment methods, including a single audio reference, independent emotional reference audio, emotional vectors, and text descriptions. This flexibility significantly enhances the expressiveness of synthesized speech, meeting users' diverse needs for emotional expression.

According to official examples, IndexTTS-2.0 can be widely applied in AI dubbing, audiobooks, animated comics, video translation, voice dialogue, and podcast production, expanding the boundaries of speech synthesis technology. Especially in terms of global content export, IndexTTS-2.0 provides important technical support, enabling cross-language videos to achieve a near "difference-free" localized experience. Whether it's Chinese users watching foreign content or overseas users watching Chinese videos, they can enjoy a more natural and immersive auditory experience while preserving the original voice style and emotion. This technological breakthrough reduces the barriers for high-quality content to spread across languages, providing a solid foundation for the global implementation of AIGC technology.

Currently, the project paper, complete code, model weights, and online demo page of IndexTTS-2.0 have been released simultaneously. The IndexTTS team stated that they will continue to optimize model performance and collaborate with the developer community to promote the construction of a voice technology ecosystem for multilingual communication and global cultural connectivity.

Online demo address:

https://huggingface.co/spaces/IndexTeam/IndexTTS-2-Demo

Key points:
🌟 B站's IndexTTS-2.0 system is fully open-sourced and has the functions of controllable emotions and adjustable duration.
🕒 The system introduces a time encoding mechanism and disentangled modeling, enhancing the naturalness and expressiveness of speech synthesis.
🌍 The system provides technical support for global content export, offering a better localized experience for cross-language videos.

Zhang Hongjiang's Speech at the Bund Conference: Infrastructure Accelerates Expansion, AI is Entering Industrial Scalability

On Sep 11, at the 2025 Inclusion·Bund Summit, Zhang Hongjiang, partner at Source Code Capital and foreign academician of the US National Academy of Engineering, shared insights on LLMs, AI agents, and the agent economy. He emphasized the continued relevance of Scaling Law, noting that agents and economic transformation will reshape society. While pre-training model scaling slows, reasoning models introduce a new 'reasoning scaling law' curve.....

Silicon-Based Flow Launches Ling-mini-2.0 with Ant Group, Achieving Both Speed and Performance

Recently, the Silicon-Based Flow Large Model Service Platform officially launched Ant Group's newly open-sourced Ling-mini-2.0. This new model demonstrates extremely high generation speed while maintaining advanced performance, marking a breakthrough in achieving great power with a small scale. Ling-mini-2.0 adopts a MoE architecture, with a total of 16B parameters, but only activates 1.4B parameters per Token during generation, significantly improving generation speed. This design not only enables the model to process

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services

B Station Open-Source Text-to-Speech Model IndexTTS-2.0 with Controllable Emotion and Duration

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Replit Launches Agent 3 Smart Development Assistant, Runtime Extended to 200 Minutes

Zhang Hongjiang's Speech at the Bund Conference: Infrastructure Accelerates Expansion, AI is Entering Industrial Scalability

From Code to Resources: Wang Jian on the Open Path of AI, Which Goes Far Beyond Open Source

First Statement After the $2 Billion Seed Round! Mira Murati's Mysterious Lab Challenges AI Randomness, Determined to Make Machine Thinking Predictable

UAE Launches the World's Fastest Open Source AI Model K2 Think with 32 Billion Parameters

Silicon-Based Flow Launches Ling-mini-2.0 with Ant Group, Achieving Both Speed and Performance

OpenAI Enters the South Korean Market, Collaborates with Samsung and SK Hynix to Build the Future of AI!

Tencent Opensources HunyuanImage 2.1! 2K High-Definition Amazing Images Generated in Seconds, Precise Control over Multiple Subjects with Complex Prompts - AI Design Efficiency Skyrockets?

Walmart Launches Super Intelligent Platform WIBEY to Reshape Developer Workflows

AI Empowers Nuclear Operations: Nuclearn Completes $10.5 Million Series A Funding, Serving 65 Nuclear Reactors Globally

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

B Station Open-Source Text-to-Speech Model IndexTTS-2.0 with Controllable Emotion and Duration

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Replit Launches Agent 3 Smart Development Assistant, Runtime Extended to 200 Minutes

Zhang Hongjiang's Speech at the Bund Conference: Infrastructure Accelerates Expansion, AI is Entering Industrial Scalability

From Code to Resources: Wang Jian on the Open Path of AI, Which Goes Far Beyond Open Source

First Statement After the $2 Billion Seed Round! Mira Murati's Mysterious Lab Challenges AI Randomness, Determined to Make Machine Thinking Predictable

UAE Launches the World's Fastest Open Source AI Model K2 Think with 32 Billion Parameters

Silicon-Based Flow Launches Ling-mini-2.0 with Ant Group, Achieving Both Speed and Performance

OpenAI Enters the South Korean Market, Collaborates with Samsung and SK Hynix to Build the Future of AI!

Tencent Opensources HunyuanImage 2.1! 2K High-Definition Amazing Images Generated in Seconds, Precise Control over Multiple Subjects with Complex Prompts - AI Design Efficiency Skyrockets?

Walmart Launches Super Intelligent Platform WIBEY to Reshape Developer Workflows

AI Empowers Nuclear Operations: Nuclearn Completes $10.5 Million Series A Funding, Serving 65 Nuclear Reactors Globally

GEO Services