Best Speech Recognition AI Tools & Models - Premium Speech Recognition News

AI News

Efficient and Lightweight: IBM Launches Granite 4.0 1B Speech Multimodal Speech Large Model

IBM launches Granite4.01B Speech, a model optimized for edge computing and enterprise use. It halves parameters while boosting performance, supports multilingual ASR and bidirectional translation, adds Japanese recognition and keyword biasing, and significantly improves English transcription accuracy.....

15k yesterday

Efficient and Lightweight: IBM Launches Granite 4.0 1B Speech Multimodal Speech Large Model

Google Invests Heavily in Medical AI Open Source Ecosystem: MedGemma 1.5 Enhances Medical Imaging Capabilities, Simultaneously Launches Speech-to-Text Model MedASR

The company launched the new-generation open-source medical large model MedGemma 1.5 and clinical speech recognition model MedASR, strengthening its medical technology layout. MedGemma 1.5, based on the Gemma series, enhances medical image understanding, processing text records, test reports, medical literature, and imaging data like X-rays and CT scans to aid preliminary screening and diagnosis.....

12.2k yesterday

Amazon Officially Launches Alexa+ and the Dedicated Website Alexa.com, Officially Challenging ChatGPT!

Amazon launches the dedicated website for Alexa+, allowing users to interact directly through a browser, indicating intensified competition with ChatGPT. The new AI assistant significantly improves in speech recognition and understanding capabilities.

11.2k 03-18

Vice Director of Tencent AI Lab Resigns, Hunyuan Team Welcomes New Leadership Transition, Where Is the Future of Tencent AI Heading?

Yu Dong, deputy director of Tencent AI Lab, has resigned for personal development reasons. He was responsible for speech processing, natural language processing, and digital human technology R&D, with extensive experience in deep learning and speech recognition.....

11.9k 2 days ago

AI Products

Speechly

Turn your thoughts into a professional email in just seconds, ready to send anytime.

Mail Assistant

7.4k

Unmute

Converse with AI using low-latency speech recognition and synthesis models.

Voice recognition

9.8k

parakeet-tdt-0.6b-v2

A high-quality English automatic speech recognition model that supports punctuation and timestamp prediction.

Voice recognition

11.4k

Kimi-Audio

Kimi-Audio is an open-source audio foundation model that excels in audio understanding and generation.

Voice recognition

12.7k

Models

Grok 4 Fast

Xai

$1.4

Input tokens/M

$3.5

Output tokens/M

Context Length

Gemini 2.0 Flash

Google

$0.7

Input tokens/M

$2.8

Output tokens/M

Context Length

Claude Haiku 4.5

Anthropic

Input tokens/M

$35

Output tokens/M

200

Context Length

Claude Sonnet 4.5

Anthropic

$21

Input tokens/M

$105

Output tokens/M

200

Context Length

Claude 3 Sonnet

Anthropic

$21

Input tokens/M

$105

Output tokens/M

200

Context Length

qwen3-vl-plus

Alibaba

Input tokens/M

$10

Output tokens/M

256

Context Length

qwen3-livetranslate-flaltimeash-re-2025-09-22

Alibaba

Input tokens/M

$240

Output tokens/M

Context Length

wan2.5-t2i-preview

Alibaba

Input tokens/M

Output tokens/M

Context Length

qwen3-omni-30b-a3b-captioner

Alibaba

$15.8

Input tokens/M

$12.7

Output tokens/M

Context Length

qwen3-tts-flash

Alibaba

Input tokens/M

Output tokens/M

Context Length

qwen3-tts-flash-realtime

Alibaba

Input tokens/M

Output tokens/M

Context Length

Doubao - Seedream - 4.0

Bytedance

Input tokens/M

Output tokens/M

Context Length

Doubao - Seedream - 3.0 - t2i

Bytedance

Input tokens/M

Output tokens/M

Context Length

Doubao-SeedEdit-3.0-i2i

Bytedance

Input tokens/M

Output tokens/M

Context Length

qwen3-asr-flash

Alibaba

Input tokens/M

Output tokens/M

Context Length

qwen-vl-plus

Alibaba

$0.8

Input tokens/M

Output tokens/M

128

Context Length

Doubao-Seedance-1.0-pro

Bytedance

Input tokens/M

Output tokens/M

Context Length

Grok Code Fast 1

Xai

$1.4

Input tokens/M

$10.5

Output tokens/M

256

Context Length

Hunyuan-T1-latest

Tencent

Input tokens/M

Output tokens/M

Context Length

GPT-5 nano

Openai

$0.35

Input tokens/M

$2.8

Output tokens/M

400

Context Length

MCP

Douyin Mcp Server

A TikTok video processing server based on the MCP protocol, supporting watermark-free video download, audio extraction, and text conversion functions.

python

10.8k

3.0points

Speech Interface (Faster Whisper)

Speech MCP is a voice interaction extension designed for Goose, providing real-time voice recognition, text-to-speech, and audio visualization functions.

python

6.4k

2.5points

Mcp Video Extraction Plus

This project expands the video speech recognition function. It originally only supported the local Whisper model, and now it newly supports the online speech recognition services of CapCut and Bcut, providing a flexible multi - service selection architecture.

python

9.7k

2.5points

Fast Whisper MCP Server

A high-performance speech recognition MCP server based on Faster Whisper, providing efficient audio transcription capabilities, supporting batch processing, multiple model sizes, and multiple output formats.

python

11.3k

2.5points

Speech Mcp

Speech MCP is a voice interaction extension designed for Goose, providing real-time voice recognition, high-quality text-to-speech, multilingual support, and a modern audio visualization interface. It supports multi-character dialogue generation and audio transcription functions.

python

10k

2.5points

Mcp Speaker Diarization

The MCP Speaker Diarization and Recognition System is a complete solution integrating GPU-accelerated speaker separation, speech recognition, emotion detection, and a web interface. It combines the speaker separation of pyannote.audio with the faster-whisper transcription technology, supporting persistent speaker recognition (one-time registration, permanent recognition), dual-detector emotion analysis (combining general AI and personalized voiceprints), real-time stream processing, REST API, and MCP server, and is designed for AI intelligent agent integration and hobby projects.

python

9.9k

2.0points

Asr_mcp_server

The ASR MCP server is an automatic speech recognition service based on the Whisper engine, providing the speech synthesis function through the MCP tool for easy application integration.

python

10.2k

2.0points

Dy Xhs Mcp Server

The Douyin and Xiaohongshu Content Extraction MCP Server supports extracting video, image, and text content from Douyin and Xiaohongshu sharing links, providing functions such as watermark-free video acquisition, AI speech recognition, and text extraction.

python

2.0points

Empowering the future, your artificial intelligence solution think tank

English 简体中文繁體中文にほんご

FirendLinks:

AI Newsletters AI Tools MCP Servers AI News AIBase LLM Leaderboard AI Ranking

Business Cooperation Site Map

AI News

Efficient and Lightweight: IBM Launches Granite 4.0 1B Speech Multimodal Speech Large Model

Google Invests Heavily in Medical AI Open Source Ecosystem: MedGemma 1.5 Enhances Medical Imaging Capabilities, Simultaneously Launches Speech-to-Text Model MedASR

Amazon Officially Launches Alexa+ and the Dedicated Website Alexa.com, Officially Challenging ChatGPT!

Vice Director of Tencent AI Lab Resigns, Hunyuan Team Welcomes New Leadership Transition, Where Is the Future of Tencent AI Heading?

AI Products

Speechly

Unmute

parakeet-tdt-0.6b-v2

Kimi-Audio

Models

Grok 4 Fast

Gemini 2.0 Flash

Claude Haiku 4.5

Claude Sonnet 4.5

Claude 3 Sonnet

qwen3-vl-plus

qwen3-livetranslate-flaltimeash-re-2025-09-22

wan2.5-t2i-preview

qwen3-omni-30b-a3b-captioner

qwen3-tts-flash

qwen3-tts-flash-realtime

Doubao - Seedream - 4.0

Doubao - Seedream - 3.0 - t2i

Doubao-SeedEdit-3.0-i2i

qwen3-asr-flash

qwen-vl-plus

Doubao-Seedance-1.0-pro

Grok Code Fast 1

Hunyuan-T1-latest

GPT-5 nano

Crisperwhisper Unsloth Mlx 8b

GigaAM V3

Asr 19m V2 En 32b

Whisperv

Whisper Small Swh Finetuned

Everos

MERaLiON SER V1

Whisper Small Serlabs Twi Asr

Ming Flash Omni Preview

Whisper Small Bambara V2 Kis

Latin_whisper Small

Medwhisper Large V3 Ita

Parakeet Ctc 1.1b

Borealis

Whisper Small Ru Cv17

Asr Whisper Helpline Sw V1

SE_DiCoW

Moonshine Tiny Vi

Whisper Large V3 Finetuned For ATC

Parakeet Tdt 0.6b V3 Coreml

MCP

Douyin Mcp Server

Speech Interface (Faster Whisper)

Mcp Video Extraction Plus

Fast Whisper MCP Server

Speech Mcp

Mcp Speaker Diarization

Asr_mcp_server

Dy Xhs Mcp Server