Alibaba Launches Revolutionary Speech Recognition Model FunAudio-ASR with Remarkable Noise Reduction

AIbase基地

Published inAI News · 3 min read · Sep 16, 2025

Recently, Alibaba's TONGYI Lab officially launched its latest end-to-end speech recognition large model - FunAudio-ASR. The biggest highlight of this model is its innovative "Context module," which significantly improves the accuracy of speech recognition in high-noise environments, reducing the hallucination rate from 78.5% to 10.7%, a reduction of nearly 70%. This technological breakthrough has set a new benchmark for the speech recognition industry, especially suitable for noisy environments such as meetings and public places.

The FunAudio-ASR model used tens of millions of hours of audio data during training, and integrated the semantic understanding capabilities of large language models, making its performance in complex conditions such as far-field, noisy, and multi-speaker scenarios exceed many mainstream speech recognition systems such as Seed-ASR and KimiAudio-8B. Through this technology, users can enjoy clearer and more accurate speech recognition results when using speech recognition.

In addition to the full version, Alibaba also released a lightweight version called FunAudio-ASR-nano. This version maintains high recognition accuracy while reducing inference costs, making it suitable for deployment environments with high resource requirements. Whether it's a large enterprise or a small team, they can find a solution that suits their needs.

Currently, FunAudio-ASR has been practically applied in DingTalk's "AI Note-taking" feature, video conferences, and the DingTalk A1 hardware. In addition, its API has been officially launched on Alibaba Cloud's BaiLian platform, making it convenient for developers to integrate and use. For enterprise users, this means they can leverage this advanced technology to improve meeting efficiency and enhance communication effectiveness.

FunAudio-ASR not only brings new breakthroughs to speech recognition technology but also provides strong support for user applications, promoting the further popularization and application of AI technology.

Official introduction: https://mp.weixin.qq.com/s/7l5EPTU7cpz7GSN4RP91rg

FunAudio-ASR Context Module Speech Recognition Tongyi Lab

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Wispr Secures $25 Million in Series B Funding: User Growth Surpasses 100 Times Annually, Plans to Develop Its Own ASR to Reduce Error Rate to 10%

Wispr raised $25M in Series B+, totaling $81M. Its Flow Dictation app achieved 50% voice input in 3 months, serving 270 Fortune 500 firms and 125 paying clients. User growth surged 100x YoY, with 70% 12-month retention and 40% MoM growth. Proprietary ASR error rate is only 10%.....

Nov 21, 2025

130

OpenAI releases GPT-5.1-Codex-Max: Introduces context compression mechanism, SWE-bench accuracy improves to 77.9%

OpenAI launches GPT-5.1-Codex-Max, optimized for complex software engineering with support for hundreds of thousands of code lines. Features dynamic compression to reduce memory loss, achieving 77.9% SWE-bench and 79.9% engineer task accuracy, with 12% token reduction and 0.920 unsafe content detection score.....

Nov 20, 2025

180

MOSS-Speech Open Source: China's First Speech-to-Speech Large Model, Bypassing Text Intermediate

The MOSS team from Fudan University released MOSS-Speech, which realizes end-to-end speech dialogue for the first time. The model is now available and open-sourced on Hugging Face. It adopts a 'layer splitting' architecture, freezing the original text model and adding new layers for speech understanding, semantic alignment, and vocoder. It can complete speech Q&A, emotional imitation, and laughter generation in one step, without the traditional three-step process. Evaluation results show that the word error rate has been reduced to 4.1% in the ZeroSpeech2025 task, and the emotion recognition accuracy reached 91.2%.

Nov 20, 2025

210

Google DeepMind Establishes AI Lab in Singapore to Provide Free AI Pro Services to Students

Google DeepMind establishes AI research lab in Singapore, doubling team size to advance AI R&D and applications across Asia-Pacific.....

Nov 19, 2025

140

Google Unveils Gemini 3: A 1 Million Token Context Window Competing with GPT-5.1, Ranking Top on LMArena

Google launches Gemini 3, offering for the first time a 1 million tokens context window, natively supporting multi-modal reasoning for text, images, videos, and code. Gemini 3 Pro achieves a 91.9% accuracy rate on GPQA testing, ranking top on LMArena with 1501 points, surpassing GPT-5.1 and Claude 4.5. It adopts the Deep Think reasoning mode, productizing the 'thinking chain' through 'thought signatures', showing outstanding performance in logical, factual, and scientific reasoning.

Nov 19, 2025

150

Open Source Intelligent Agent MiroThinker v1.0 Released: 256K Context Support for 600 Tool Calls, Proposes a Deep Interaction Scaling Framework

MiroMind releases MiroThinker v1.0 with 256K context and 600 tool calls. It introduces 'Deep Interactive Scaling' for self-evolution via real-time feedback, integrating tools like search and code execution to autonomously handle complex tasks in hours.....

Nov 17, 2025

220

Alibaba Cloud Large Model Prices Cut in Half! Tongyi Qianwen 3-Max Call Cost Reduced by 50%, Only 10% Fee Charged for Cache Hits

Alibaba Cloud BaiLian announced that starting from November 13, 2025, the core call fee for the Tongyi Qianwen 3-Max model will be halved, and the cache billing strategy has been optimized, significantly reducing the cost of enterprise AI applications. This move aims to lower the entry barrier for large model usage and accelerate digital transformation for small and medium enterprises.

Nov 14, 2025

190

Reverie Launches a Speech Recognition Model Dedicated to India, Outperforming Deepgram

Reverie company launched a new text to speech model, supporting Hindi, English, and Hinglish mixed language, adapting to India's multilingual environment. The model has processed 3 million API calls and has shown high accuracy and fast response capabilities in industries such as banking and call centers.

Nov 13, 2025

190

AI Daily: Lovart AI Launches Element Separation Feature; Xcode 26.1.1 Released; Alibaba Cloud Tongyi Model Makes Its Largest-Scale Deployment for Double 11

Lovart AI's 'Element Split' feature intelligently separates poster images into text, subject, and background layers, enabling easy editing and one-click PSD generation for enhanced design efficiency.....

Nov 12, 2025

240

Meta Revolutionizes Speech Technology! Omnilingual ASR Supports 1600 Languages, Small Languages Can Also Be Heard by AI

Meta's Omnilingual ASR system achieves high-accuracy recognition for 1600 languages using context learning, requiring minimal audio samples. This open-source technology promotes digital inclusion by supporting endangered and minority languages.....

Nov 12, 2025

230

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

AI Brand Monitoring Tool

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Deployment Calculator

AI Dataset Collection

Intelligent Document Recognition