Microsoft Open Sources VibeVoice-1.5B Model: New Breakthrough in 90-Minute Ultra-Long Speech Synthesis

AIbase基地

Published inAI News · 4 min read · Aug 26, 2025

Recently, Microsoft Research officially open-sourced its latest audio model — VibeVoice-1.5B. The model has achieved multiple major breakthroughs in speech synthesis technology, making the synthesized speech more natural, longer in duration, and of better quality.

VibeVoice-1.5B is capable of synthesizing ultra-long speech of up to 90 minutes in one go, which is rare in previous speech synthesis models. Previously, most models could only synthesize speech within 60 minutes, and they often experienced voice drift and semantic disconnection when exceeding 30 minutes. This model also supports up to four speakers, significantly improving the performance of multi-speaker synthesis, while previous open-source models could support at most two speakers. In addition, VibeVoice has achieved a compression rate of 3200 times for 24kHz raw audio, greatly improving compression efficiency while maintaining high-fidelity speech quality.

The core of the VibeVoice model lies in its unique dual tokenizer architecture. Unlike traditional TTS models that rely on a single tokenizer to extract features, VibeVoice innovatively introduces a collaborative mechanism between the acoustic tokenizer and the semantic tokenizer, solving the problem of mismatch between voice and semantics. The acoustic tokenizer focuses on preserving voice characteristics and achieving extreme compression, while the semantic tokenizer is responsible for extracting features consistent with the text semantics, ensuring that the emotional tone of the synthesized speech aligns with the text content.

In terms of training, VibeVoice adopts a curriculum learning strategy, gradually increasing the length of the input sequence to avoid training failures caused by processing ultra-long sequences. During the training process, the parameters of the acoustic tokenizer and the semantic tokenizer remain unchanged, ensuring the stability of the feature extraction module and thus shortening the training cycle.

The open-sourcing of VibeVoice-1.5B not only brings new technological breakthroughs to the field of speech synthesis but also lays the foundation for the release of larger parameter models in the future. For researchers and developers in audio processing and speech synthesis, this is an innovative development worth paying attention to.

Open source address: https://huggingface.co/microsoft/VibeVoice-1.5B

Online demo: https://aka.ms/VibeVoice-Demo

Key points:
🔊 The VibeVoice-1.5B model can synthesize ultra-long speech of up to 90 minutes in one go and supports up to four speakers.
💾 The model achieves a 3200 times audio compression rate while maintaining high-fidelity speech quality.
🤖 It uses a dual tokenizer architecture to solve the problem of mismatch between voice and semantics.

AI Daily: PixVerse R1 Real-Time World Model Released; Vidu Launches AI One-Click MV Generation Feature; Kuaishou AI ARR Reaches $2.4 Billion

Welcome to the 【AI Daily】 segment! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with the latest content in the AI field, focusing on developers, helping you understand technological trends and innovative AI product applications. Click to learn more about new AI products: https://app.aibase.com/zh1. Aishiketech released the world's first general-purpose real-time world model PixVerse R1 with up to 1080P video quality. Aishiketech released the world's first general-purpose real-time world model PixVerse R1.

Google Invests Heavily in Medical AI Open Source Ecosystem: MedGemma 1.5 Enhances Medical Imaging Capabilities, Simultaneously Launches Speech-to-Text Model MedASR

The company launched the new-generation open-source medical large model MedGemma 1.5 and clinical speech recognition model MedASR, strengthening its medical technology layout. MedGemma 1.5, based on the Gemma series, enhances medical image understanding, processing text records, test reports, medical literature, and imaging data like X-rays and CT scans to aid preliminary screening and diagnosis.....

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Brand Visibility

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

Microsoft Open Sources VibeVoice-1.5B Model: New Breakthrough in 90-Minute Ultra-Long Speech Synthesis

AIbase基地

This article is from AIbase Daily

AI News Recommendations

AliQianwen App to be Launched Tomorrow! Integrates Maps, Food Delivery, Shopping, and Health to Create Your AI Lifestyle Decision Center

Apple Releases Its New Multimodal AI Product Manzano: The Perfect Combination of Vision and Creativity

Apple Siri Gets a Major Upgrade: New Features Coming Soon - Emotional Support and Travel Assistant Fly Together

Report: OpenAI Secretly Developing Hardware Code-named Sweetpea to Compete with AirPods

AI Daily: PixVerse R1 Real-Time World Model Released; Vidu Launches AI One-Click MV Generation Feature; Kuaishou AI ARR Reaches $2.4 Billion

Google Invests Heavily in Medical AI Open Source Ecosystem: MedGemma 1.5 Enhances Medical Imaging Capabilities, Simultaneously Launches Speech-to-Text Model MedASR

South Korea's AI National Team Caught in Open-Source Controversy, Three Shortlisted Companies Exposed for Using Chinese Model Code

The New Standard for Programming Agents! MiniMax Releases OctoCodingBench Benchmark

Anthropic's New AI Coding Assistant Cowork: Intelligent Creation Completed in Just a Week and a Half

Google Translate Launches New AI Feature: One-Click Generation of Three Translations to Help You Express Accurately

AI News Recommendations

AliQianwen App to be Launched Tomorrow! Integrates Maps, Food Delivery, Shopping, and Health to Create Your AI Lifestyle Decision Center

Apple Releases Its New Multimodal AI Product Manzano: The Perfect Combination of Vision and Creativity

Apple Siri Gets a Major Upgrade: New Features Coming Soon - Emotional Support and Travel Assistant Fly Together

Report: OpenAI Secretly Developing Hardware Code-named Sweetpea to Compete with AirPods

AI Daily: PixVerse R1 Real-Time World Model Released; Vidu Launches AI One-Click MV Generation Feature; Kuaishou AI ARR Reaches $2.4 Billion

Google Invests Heavily in Medical AI Open Source Ecosystem: MedGemma 1.5 Enhances Medical Imaging Capabilities, Simultaneously Launches Speech-to-Text Model MedASR

South Korea's AI National Team Caught in Open-Source Controversy, Three Shortlisted Companies Exposed for Using Chinese Model Code

The New Standard for Programming Agents! MiniMax Releases OctoCodingBench Benchmark

Anthropic's New AI Coding Assistant Cowork: Intelligent Creation Completed in Just a Week and a Half

Google Translate Launches New AI Feature: One-Click Generation of Three Translations to Help You Express Accurately

GEO Services