Microsoft Launches VibeVoice-Realtime: A New Real-Time Text-to-Speech Model for Interactive Applications
Microsoft launches VibeVoice-Realtime-0.5B, a lightweight real-time text-to-speech model supporting streaming input and long-form output for agent applications and live data narration. It starts speech output in about 300ms, works with language models for responses, and uses a framework with continuous speech tokens for next-token diffusion.....