Bidirectional Audio-Visual Separation: Tongyi Lab Releases PrismAudio to Let AI Understand Videos and Revoice Them

AIbase基地

Published inAI News · 4 min read · Mar 24, 2026

At a time when AI video generation is booming, "silent with pictures" or "sound mismatch" has always been the last barrier affecting immersion. To address this pain point, Alibaba Tongyi Lab recently introduced a new video-to-audio framework called PrismAudio. This research has been accepted by the top AI conference ICLR 2026, with the core aim of automatically matching videos with precise ambient sound effects.

Think First, Then Speak: The Master of Voice Acting with "Chain of Thought"

Traditional voice acting models usually generate sounds in an "intuitive" way, often resulting in awkward situations such as a horse stepping on the ground but making a bird call, or the sound lagging half a beat behind the visuals. The breakthrough of PrismAudio lies in its ability to "take notes first, then speak."

Decomposition Chain of Thought: Before generating sounds, the model analyzes the video content: What is in the scene? When should the sound start? Is the audio crisp or deep? Is the sound source on the left or right?
Four Teachers Scoring: To ensure quality, the development team introduced reinforcement learning, where four "virtual teachers" score the output from four dimensions: semantic consistency, temporal synchronization, aesthetic quality, and spatial accuracy. This multi-dimensional feedback mechanism solves the long-standing problem of previous models "focusing on one aspect while neglecting others."

Lightweight and Efficient: 0.6 Seconds for a 9-Second Video

Not only does PrismAudio produce accurate sounds, but it also runs extremely fast. Thanks to its self-developed Fast-GRPO efficient training algorithm, the model achieves a significant performance leap while maintaining high operational efficiency:

Small Size, Big Power: The model has only 518 million parameters, far fewer than similar models that typically have tens of billions of parameters.
Ultra-Fast Response: Generating a 9-second high-quality audio takes only 0.63 seconds, almost achieving "instant delivery."

Industry Insight: The Era of Authentic Environmental Sound Effects

The emergence of PrismAudio not only provides a powerful automation tool for film post-production and short video creation, but also offers new ideas for multi-object generation tasks. When AI can accurately balance the texture and spatial sense of sound, future video creation will truly achieve "what you see is what you hear."

Paper link: arXiv:2511.18833

Open source link: https://prismaudio-project.github.io/

AI Terms Video Generated Audio PrismAudio Tongyi Lab of Alibaba

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Human Game Experience Upgraded! Free and Open-Source AI Chess Engine Maia 3 Officially Released

The Maia Chess team released the open-source chess engine Maia 3, trained on 250 million human games, with an Elo rating of approximately 1800 points, an increase of nearly 300 points from the previous version. The engine is free and open-source, supports local deployment, and focuses on simulating human decision-making patterns, promoting the popularization of AI chess engines.

May 26, 2026

110

BaiChuan Intelligence Launches Baichuan-M4 Large Model and BaiXiaoYi AI Medical System Transitions from Consultation to General Practice Management

Baichuan Intelligence launches next-gen medical AI model Baichuan-M4 and AI family doctor 'Baixiaoyi', addressing AI healthcare's consultation-touch gap. Baichuan-M4 tops medical benchmarks with 3.3% hallucination rate and strong evidence-based reasoning, advancing AI in healthcare.....

May 26, 2026

470

Costly Mistakes! George Hotz Warns AI Programming Agents Are the Biggest Threat to Software

George Hotz pointed out that over-relying on AI programming agents is a costly mistake in software development. After six months of testing, he found that AI tools can quickly build projects, but they statistically imitate rather than truly understand, leading to more hidden code defects and sparking widespread discussions and reflections in the tech community.

May 26, 2026

130

AI Daily: Unexpected Exposure of OpenAI's New Flagship GPT-5.6; Kunlun Wanwei Releases Tianshi SkyClaw-v1.0; Alibaba's Qwen3.7-Max Ranks Second Globally in Programming Ability

Welcome to the [AI Daily] column! This is your guide to exploring the world of artificial intelligence every day. Every day, we present you with the latest content in the AI field, focusing on developers, helping you grasp technological trends and understand innovative AI product applications. Click to learn more about new AI products: https://app.aibase.com/zh1. Context of 1.5 million tokens! Alipay has also launched a global AI payment solution, supporting the development of the AI business ecosystem.

May 26, 2026

110

Alibaba Qwen3.7-Max Ranks Second Globally in Programming Capabilities! Code Arena 1541 Points, Just Behind Claude 35, Refreshing Productivity Limits with Autonomous Tasks in 35 Hours

Alibaba's Qwen3.7-Max ranked second globally in the latest Code Arena ranking with 1541 points, just behind the Claude series, surpassing models such as GPT-5.5 and Gemini 3.5 Flash, becoming a new benchmark in the programming field for domestic large models. It marks a major breakthrough for Chinese AI in Agentic Coding and long-term tasks.

May 26, 2026

150

Hyper3D Rodin Gen-2.5 Release: 4 Seconds for a Million Faces, the World's First 10-Million-Face 3D Generation Model, Details Close to Production-Level Assets

A major breakthrough has been achieved in the field of 3D generation AI, with Hyper3D releasing the Rodin Gen-2.5 model, claimed to be the strongest globally. This model can generate a million-face model in 4 seconds and first achieves 10-million-face-level 3D generation, with details capable of presenting pores and microstructures of skin. Its core innovation is the introduction of an adaptive thinking effort mechanism, dynamically adjusting computational resources based on task complexity, marking a milestone where AI 3D moves from 'being visually acceptable' to 'being practically usable'.

May 26, 2026

150

CEO of Ant Group, Han Xinyi: Built a New AI Payment Service Suitable for the AI Agent Economy

Han Xinyi, CEO of Ant Group, pointed out at the Alipay AI Payment Ecosystem Conference that the essence of commerce has not changed in the AI era, but AI agents will restructure commercial roles. Alipay, relying on 22 years of technological accumulation and business experience, has launched a new AI payment service to support the new business ecosystem. He emphasized that AI ultimately serves people, but the payment entity has shifted from humans to AI Agents, and service continuity is guaranteed by computing power scheduling.

May 26, 2026

110

Alipay Reveals: 300 Million AI Payments, Supports 95% General Intelligent Agents, Launches AI Wallet and Token Pay

Alipay announced that 'AI Payment' has completed 300 million AI agent transactions, supporting 95% of general AI agent frameworks, making it the world's first large-scale commercial AI-native payment infrastructure. It also launched the first Token Pay service and AI wallet product, alongside AI Pay and AI Receive, building a full-stack AI-native payment system covering authorization, management, payment, settlement, security, and trust, marking ....

May 26, 2026

160

Breaks 300 million transactions! The world's first large-scale commercial AI-native payment infrastructure is born

Alipay announces 'AI Payment' has completed 300 million AI agent transactions, supporting 95% of general agent frameworks, becoming the world's first large-scale commercial AI-native payment infrastructure. It also launches Token Pay and AI Wallet products, building a full-stack AI-native payment system to drive the AI era.....

May 26, 2026

130

Hidden Malicious Weekly Report! Microsoft Copilot Exposes Indirect Prompt Injection Vulnerability Risk

Microsoft 365's AI assistant, Copilot Cowork, has been exposed to a serious security vulnerability. Attackers can implant malicious instructions in office templates using 'indirect prompt injection' technology, allowing them to steal and leak confidential files from enterprise cloud drives without user approval. They can also send emails and post Teams messages on behalf of users, posing a threat to organizational data security.

May 26, 2026

170

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Ranking Monitor

AI Conversation Insight

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Ranking Optimization

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator

Bidirectional Audio-Visual Separation: Tongyi Lab Releases PrismAudio to Let AI Understand Videos and Revoice Them

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Human Game Experience Upgraded! Free and Open-Source AI Chess Engine Maia 3 Officially Released

BaiChuan Intelligence Launches Baichuan-M4 Large Model and BaiXiaoYi AI Medical System Transitions from Consultation to General Practice Management

Costly Mistakes! George Hotz Warns AI Programming Agents Are the Biggest Threat to Software

AI Daily: Unexpected Exposure of OpenAI's New Flagship GPT-5.6; Kunlun Wanwei Releases Tianshi SkyClaw-v1.0; Alibaba's Qwen3.7-Max Ranks Second Globally in Programming Ability

Alibaba Qwen3.7-Max Ranks Second Globally in Programming Capabilities! Code Arena 1541 Points, Just Behind Claude 35, Refreshing Productivity Limits with Autonomous Tasks in 35 Hours

Hyper3D Rodin Gen-2.5 Release: 4 Seconds for a Million Faces, the World's First 10-Million-Face 3D Generation Model, Details Close to Production-Level Assets

CEO of Ant Group, Han Xinyi: Built a New AI Payment Service Suitable for the AI Agent Economy

Alipay Reveals: 300 Million AI Payments, Supports 95% General Intelligent Agents, Launches AI Wallet and Token Pay

Breaks 300 million transactions! The world's first large-scale commercial AI-native payment infrastructure is born

Hidden Malicious Weekly Report! Microsoft Copilot Exposes Indirect Prompt Injection Vulnerability Risk

AI News Recommendations

Human Game Experience Upgraded! Free and Open-Source AI Chess Engine Maia 3 Officially Released

BaiChuan Intelligence Launches Baichuan-M4 Large Model and BaiXiaoYi AI Medical System Transitions from Consultation to General Practice Management

Costly Mistakes! George Hotz Warns AI Programming Agents Are the Biggest Threat to Software

AI Daily: Unexpected Exposure of OpenAI's New Flagship GPT-5.6; Kunlun Wanwei Releases Tianshi SkyClaw-v1.0; Alibaba's Qwen3.7-Max Ranks Second Globally in Programming Ability

Alibaba Qwen3.7-Max Ranks Second Globally in Programming Capabilities! Code Arena 1541 Points, Just Behind Claude 35, Refreshing Productivity Limits with Autonomous Tasks in 35 Hours

Hyper3D Rodin Gen-2.5 Release: 4 Seconds for a Million Faces, the World's First 10-Million-Face 3D Generation Model, Details Close to Production-Level Assets

CEO of Ant Group, Han Xinyi: Built a New AI Payment Service Suitable for the AI Agent Economy

Alipay Reveals: 300 Million AI Payments, Supports 95% General Intelligent Agents, Launches AI Wallet and Token Pay

Breaks 300 million transactions! The world's first large-scale commercial AI-native payment infrastructure is born

Hidden Malicious Weekly Report! Microsoft Copilot Exposes Indirect Prompt Injection Vulnerability Risk