Information

Latest AI News

Explore AI Frontiers, Master Industry Trends

AI Daily Brief

Your Daily AI Brief - Never Miss What's Next

Information

AI Product Finder

Smart Product Discovery - Comprehensive Market Intelligence

AI Product Rankings

AI Product Power Rankings - Performance, Buzz & Trends

AI Product Submit

Submit Your AI Product - Amplify Reach & Drive Growth

Tools

AI Tools Directory

Discover The Best AI Websites & Tools

Information

LLM API Hub

One-stop integration for all major LLM APIs.

AI Models Finder

Comprehensive AI Models Collection for All Your Development & Research Needs

Model Providers

Discover Trusted AI Model Partners - Guaranteed Reliable Support

LLM Leaderboard

AI LLM Power Rankings - Performance, Buzz & Trends

Tools

Compare LLMs

Multi-Dimensional Large Model Comparison - Find Your Perfect Match

LLM Cost Calculator

Calculate AI Model Costs Accurately - Optimize Your Budget

LLM Arena

Multi-Model Real-Time Evaluation & Quick Output Comparison

Information

MCP Servers

Discover Popular AI-MCP Services - Find Your Perfect Match Instantly

MCP Client

Easy MCP Client Integration - Access Powerful AI Capabilities

MCP Case Tutorials

Master MCP Usage - From Beginner to Expert

MCP Ranking

Top MCP Service Performance Rankings - Find Your Best Choice

MCP Service Submission

Publish & Promote Your MCP Services

Tools

MCP Playground

Test MCP Services Freely - Quick Online Experience

MCP Inspector

Quick MCP Service Testing - Fast Deployment

Tools

GEO Brand Visibility

All-in-One GEO Brand Insights Platform

AI Brand Monitoring Tool

Analyze & Track How AI Models Cite Your Brand

AI Search Visibility Checker

Detect brand's visibility on AI platforms

GEO Promotion Link Detection

Quickly evaluate the citation of promotion articles on AI platforms

Service

GEO Ranking Optimization System

Own your own GEO system and become a professional GEO optimization service provider.

GEO Services

Achieve Dominant Visibility in AI Search for Your Business or Brand with GEO Services

Tools

AI Model Compatibility Checker

Free PC Hardware Test for DeepSeek & Llama

AI Deployment Calculator

Enter Your Large Model Computing Requirements for Instant GPU, Memory & Server Configuration Recommendations

AI Tutorial

Microsoft Open-Sources Cutting-Edge Speech AI Family VibeVoice: Processes 90-Minute Multi-Speaker Dialogues in One Go, Quickly Gains 27K Stars on GitHub

AIbase基地

Published inAI News · 6 min read · Mar 30, 2026

Microsoft has recently open-sourced a cutting-edge family of voice AI models called VibeVoice, which includes capabilities such as automatic speech recognition (ASR) and text-to-speech (TTS). The project has quickly gained attention in the developer community due to its powerful long audio processing, natural multi-speaker dialogue generation, and real-time low-latency features. It has already accumulated approximately 27K Stars on GitHub.

As an open-source research framework, VibeVoice is released under the MIT license, supports local deployment, and requires no cloud subscription fees, aiming to promote collaboration and innovation in the field of speech synthesis. The model family mainly consists of three core members, each with its own focus, collectively addressing pain points in traditional voice AI, such as long sequence processing, speaker consistency, and natural fluency.

VibeVoice-ASR-7B: A Powerful Tool for Structured Speech-to-Text with Up to 60 Minutes

VibeVoice-ASR-7B is a unified speech-to-text model that can process audio files up to 60 minutes in length in one go, directly outputting structured transcriptions. The output includes "who is speaking" (speaker identification), "when it was spoken" (precise timestamps), and "what was said" (detailed content), and supports custom hot words, effectively improving the accuracy of recognizing proper nouns or technical terms. This model supports over 50 languages and is suitable for complex scenarios such as long meeting records and podcast transcription.

Community developers have already created practical tools based on this model, such as a voice input method called Vibing, which supports macOS and Windows platforms. User feedback shows that it performs well in terms of speed and accuracy, significantly improving daily voice input efficiency.

VibeVoice-TTS-1.5B: Expressive Speech Generation for Up to 90 Minutes with Multiple Speakers

VibeVoice-TTS-1.5B is the core model focused on text-to-speech, capable of producing continuous audio lasting up to 90 minutes in a single generation, supporting up to four different speakers for natural dialogue simulation. The generated speech is expressive and sounds natural and fluent, capable of simulating real pauses, emphasis, and emotional shifts, making it ideal for creating podcasts, long audio narratives, audiobooks, or multi-character dialogues.

Compared to many traditional TTS models that only support 1-2 speakers, VibeVoice-TTS has made significant breakthroughs in long-form and multi-speaker consistency. Its underlying design combines a continuous speech tokenizer (acoustic and semantic tokenizer) with a low frame rate (7.5Hz), significantly improving computational efficiency for long sequences.

VibeVoice-Realtime-0.5B: Real-Time TTS with Approximately 300 Milliseconds Latency

VibeVoice-Realtime-0.5B focuses on real-time scenarios, supporting streaming text input, with the first audio output delay of about 300 milliseconds, while also capable of generating long audio of up to 10 minutes. This model is particularly suitable for interactive applications requiring immediate responses, such as real-time voice assistants or live streaming dubbing scenarios.

In addition, the project introduced experimental speaker support, including multilingual speech and various English style variations, offering developers more customization options.

AIbase Review: Microsoft's open sourcing of VibeVoice not only lowers the entry barrier for high-performance voice AI but also provides a complete solution for local deployment. The project was temporarily taken down due to potential misuse risks, but it was re-launched after embedding audio watermarks and audible disclaimers as security mechanisms, reflecting the principles of responsible AI development. Currently, developers can obtain model weights from the GitHub repository and Hugging Face and quickly try them out via platforms like Colab.

With continued contributions from the open-source community (such as optimizations for Apple Silicon), VibeVoice is expected to accelerate its implementation in areas such as content creation, accessibility tools, and voice interaction. Interested developers can visit Microsoft's official project page for further exploration.

Project Address: https://github.com/microsoft/VibeVoice

VibeVoice AINeologism Microsoft SpeechAI

Spending 4 billion in a single season! Kuaishou's Q4 financial report is impressive: AIGC marketing materials account for the majority, AI is completely reshaping the commercial supply chain

Kuaishou's Q4 2025 revenue from online marketing hit 23.6 billion yuan, up 14.5% year-on-year, driven by AI technology, with AIGC content consumption reaching 4 billion yuan in the quarter.....

Mar 25, 2026

350

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Brand Visibility

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

Microsoft Open-Sources Cutting-Edge Speech AI Family VibeVoice: Processes 90-Minute Multi-Speaker Dialogues in One Go, Quickly Gains 27K Stars on GitHub

AIbase基地

VibeVoice-ASR-7B: A Powerful Tool for Structured Speech-to-Text with Up to 60 Minutes

VibeVoice-TTS-1.5B: Expressive Speech Generation for Up to 90 Minutes with Multiple Speakers

VibeVoice-Realtime-0.5B: Real-Time TTS with Approximately 300 Milliseconds Latency

This article is from AIbase Daily

AI News Recommendations

AI Daily: WeCom Officially Opens Sources CLI; Doubao Ranks Among the World's Top Tier; Microsoft Open-Sources Cutting-Edge Speech AI Family VibeVoice

2.1 Gigawatt AI Aircraft Carrier Sets Sail! Microsoft Takes Over Texas AI Factory Project and Moves into the Same Campus as OpenAI

The ChatGPT Moment for Embodied Intelligence: Wang Xingxing of Yuque Technology Predicts Its Arrival Within Two Years

Managing the Pressure of Large AI Investments: Microsoft Suspends Hiring in Core Departments such as Cloud Computing

Major Favor for Developers! OpenAI Launches Codex Plugin: Supports One-Click Packaging of Skills and MCP Configuration

AI Daily: Tencent Launches the First Agent Product Panorama; Xiaomi Unveils Full-Hand Haptic Bionic Hand; Gemini Supports One-Click Import of Conversation Memories

Hao Lei Says AI Will Replace 90% of Actors: Mediocre Acting Has Become an Industry Crisis

Lobster Boom Ignites the Entire AI Industry! Large Model Companies Accelerate Commercialization, Earnings Turnaround Is Near

Reject AI Faces: Microsoft OneDrive Launches a Zero-Drift Reshaping Tool

Spending 4 billion in a single season! Kuaishou's Q4 financial report is impressive: AIGC marketing materials account for the majority, AI is completely reshaping the commercial supply chain

AI News Recommendations

AI Daily: WeCom Officially Opens Sources CLI; Doubao Ranks Among the World's Top Tier; Microsoft Open-Sources Cutting-Edge Speech AI Family VibeVoice

2.1 Gigawatt AI Aircraft Carrier Sets Sail! Microsoft Takes Over Texas AI Factory Project and Moves into the Same Campus as OpenAI

The ChatGPT Moment for Embodied Intelligence: Wang Xingxing of Yuque Technology Predicts Its Arrival Within Two Years

Managing the Pressure of Large AI Investments: Microsoft Suspends Hiring in Core Departments such as Cloud Computing

Major Favor for Developers! OpenAI Launches Codex Plugin: Supports One-Click Packaging of Skills and MCP Configuration

AI Daily: Tencent Launches the First Agent Product Panorama; Xiaomi Unveils Full-Hand Haptic Bionic Hand; Gemini Supports One-Click Import of Conversation Memories

Hao Lei Says AI Will Replace 90% of Actors: Mediocre Acting Has Become an Industry Crisis

Lobster Boom Ignites the Entire AI Industry! Large Model Companies Accelerate Commercialization, Earnings Turnaround Is Near

Reject AI Faces: Microsoft OneDrive Launches a Zero-Drift Reshaping Tool

Spending 4 billion in a single season! Kuaishou's Q4 financial report is impressive: AIGC marketing materials account for the majority, AI is completely reshaping the commercial supply chain

GEO Services