ByteDance's Volcano Engine has released two new DouBao voice large models to enhance the intelligence level of speech recognition and speech generation. These two models are named DouBao · Voice Podcast Model and DouBao · Real-Time Voice Model, both achieving significant breakthroughs in multiple technical indicators.

DouBao · Voice Podcast Model

According to official introduction, just by inputting a sentence, web link, long text, or document, the voice podcast model can quickly search and learn, generate podcast scripts, and create content; the model can instantly generate double-person conversational podcast works with natural effects, featuring mutual agreement, interruptions, hesitation, etc., which align with the rhythm of podcasts; its built-in deep search function can generate podcast content following hot topics.

DouBao · Real-Time Voice Model: Immediate Communication, Seamless Interaction

The DouBao · Real-Time Voice Model focuses on real-time speech recognition and generation, widely applied in scenarios such as online meetings and educational training. Its main functions include:

The DouBao · Real-Time Voice Model is open for use by enterprise clients. This model supports advanced natural language instructions control, with capabilities like singing performances, voice impersonation, dialect interpretation, etc., significantly enhancing human-like tones, expressions, and thinking styles, capable of interrupting at any time and initiating conversations proactively.

image.png

With the launch of these two DouBao voice large models, ByteDance's Volcano Engine further strengthens its layout in the field of voice technology. Whether in podcast content creation or real-time voice interaction, these two models demonstrate great application potential and market prospects. In the future, Volcano Engine will continue to commit to technological innovation, constantly promoting the development of voice interaction, and facilitating the arrival of the intelligent era.