Best Vocoder AI Tools & Models - Premium Vocoder News

AI News

MOSS-Speech Open Source: China's First Speech-to-Speech Large Model, Bypassing Text Intermediate

The MOSS team from Fudan University released MOSS-Speech, which realizes end-to-end speech dialogue for the first time. The model is now available and open-sourced on Hugging Face. It adopts a 'layer splitting' architecture, freezing the original text model and adding new layers for speech understanding, semantic alignment, and vocoder. It can complete speech Q&A, emotional imitation, and laughter generation in one step, without the traditional three-step process. Evaluation results show that the word error rate has been reduced to 4.1% in the ZeroSpeech2025 task, and the emotion recognition accuracy reached 91.2%.

24.4k yesterday

Tencent Launches EzAudio AI: Transforming Text into Realistic Audio in Seconds

Recently, Johns Hopkins University and Tencent AI Lab jointly launched a new text-to-audio generation model named EzAudio. This technology promises unprecedented efficiency and high-quality text-to-speech conversion, marking a significant leap in artificial intelligence and audio technology. EzAudio operates by utilizing the latent space of audio waveforms instead of traditional spectrograms, an innovation that allows it to work at high temporal resolution without the need for an additional neural vocoder. The architecture of EzAudio is referred to as EzAu.

27.4k yesterday

Tencent Launches EzAudio AI: Transforming Text into Realistic Audio in Seconds

Models

qwen3-coder-plus

Alibaba

Input tokens/M

$16

Output tokens/M

Context Length

qwen3-coder-flash

Alibaba

Input tokens/M

Output tokens/M

Context Length

QianfanHuijin-8B

Baidu

Input tokens/M

Output tokens/M

Context Length

Qianfan-QI-VL

Baidu

Input tokens/M

Output tokens/M

Context Length

Qianfan-Llama-VL-8B

Baidu

Input tokens/M

Output tokens/M

Context Length

Gemma 3 4B

Google

$0.14

Input tokens/M

$0.28

Output tokens/M

131

Context Length

CogView-4

Chatglm

Input tokens/M

Output tokens/M

Context Length

Qwen_v2.5_1.5b_base

Alibaba

Input tokens/M

Output tokens/M

Context Length

Qwen_v2.5_1.5b_Instruct

Alibaba

Input tokens/M

Output tokens/M

Context Length

Qwen_v2.5_0.5b_Instruct

Alibaba

Input tokens/M

Output tokens/M

128

Context Length

Gemini 1.5 Flash

Google

$1.05

Input tokens/M

$4.2

Output tokens/M

Context Length

Baichuan-7B

Baichuan

Input tokens/M

Output tokens/M

Context Length

Empowering the future, your artificial intelligence solution think tank

English 简体中文繁體中文にほんご

FirendLinks:

AI Newsletters AI Tools MCP Servers AI News AIBase LLM Leaderboard AI Ranking

Business Cooperation Site Map

AI News

MOSS-Speech Open Source: China's First Speech-to-Speech Large Model, Bypassing Text Intermediate

Tencent Launches EzAudio AI: Transforming Text into Realistic Audio in Seconds

Models

qwen3-coder-plus

qwen3-coder-flash

QianfanHuijin-8B

Qianfan-QI-VL

Qianfan-Llama-VL-8B

Gemma 3 4B

CogView-4

Qwen_v2.5_1.5b_base

Qwen_v2.5_1.5b_Instruct

Qwen_v2.5_0.5b_Instruct

Gemini 1.5 Flash

Baichuan-7B

Bigvgan_melspec

Audio Codec 44khz

Bigvgan_base_24khz_100band

Bigvgan_base_22khz_80band

Bigvgan_24khz_100band

Bigvgan_22khz_80band

Bigvgan_v2_44khz_128band_512x

Bigvgan_v2_44khz_128band_256x

Bigvgan_v2_22khz_80band_fmax8k_256x

Bigvgan_v2_22khz_80band_256x

Bigvgan_v2_24khz_100band_256x

Vocos Mel Hifigan Compat 44100khz

Vocoder_Daft_Punk__RVC_ _200_Epochs_

Nvidia_tts_en_hifitts_hifigan_ft_fastpitch

Tts_ru_ipa_fastpitch_ruslan

Tts Hifigan German

Unit_hifigan_HK_layer12.km2500_frame_TAT TTS

Hifigan Lj V1