Microsoft Open-Sources Real-Time Speech Model VibeVoice-Realtime-0.5B, 300ms Real-Time Voice Activation, No Breathing Even for 90-Minute Long Audio
Microsoft open-sources the real-time speech model VibeVoice-Realtime-0.5B, which offers extremely low latency and near-human voice performance. The model takes an average of only 300 milliseconds from text input to voice output, far less than traditional TTS models (1-3 seconds), achieving almost zero latency real-time speech synthesis.