With the rapid development of artificial intelligence technology, the audio generation field has welcomed a heavyweight player - AudioGenie, developed by Tencent AI Lab. This innovative multimodal audio generation tool is reshaping the global AI audio market landscape with its natural and appropriate generation effects, strong contextual understanding capabilities, and the feature of not requiring training.

Multimodal Input, Comprehensive Audio Output  

AudioGenie supports multiple modal inputs such as video, text, and images, and can generate sound effects, speech, music, and mixed audio outputs. Whether it's generating immersive background music for films, voice acting for virtual characters, or adding realistic environmental sound effects for game scenes, AudioGenie can easily handle it. Its generation results are not only natural and smooth but also highly consistent with the context of the input content, demonstrating excellent semantic understanding capabilities. Experiments show that AudioGenie achieves or exceeds industry-leading levels in tasks such as video-to-multi-audio generation and text-to-multi-audio generation.

image.png

No Training Required, Self-Correction Leading Technological Innovation  

Different from traditional audio generation models that require large amounts of training data, AudioGenie adopts an innovative training-free multimodal agent framework, achieving efficient collaboration through a two-layer architecture (generation team and supervision team). The generation team dynamically selects the most suitable model for audio generation through fine-grained task decomposition and self-adaptive expert mixture (MoE) mechanisms, ensuring output quality. The supervision team is responsible for spatiotemporal consistency verification and performs self-correction through feedback loops, ensuring the high reliability of generated audio. This design completely eliminates the dependence on large-scale paired datasets, significantly reducing development costs while improving generation efficiency.

MA-Bench Benchmark Test, Setting a New Industry Standard  

To comprehensively evaluate multimodal audio generation capabilities, Tencent AI Lab introduced MA-Bench, the world's first benchmark dataset targeting multimodal to multi-audio generation (MM2MA) tasks, containing 198 videos with multiple types of audio annotations. The test results show that AudioGenie achieves or approaches state-of-the-art (SOTA) levels in nine metrics and eight tasks, especially excelling in audio quality, accuracy, content alignment, and aesthetic experience. User surveys further validate its superiority in practical applications, providing strong support for scenarios such as game development, film production, and virtual reality.

Market Impact: Challenging the Dominance of Claude and Gemini  

The release of AudioGenie not only brings users an efficient and convenient audio generation experience but also challenges the existing market structure. Combined with recent data, the rapid rise of domestic AI models such as Qwen3, Kimi-K2, and GLM-4.5 in the global market, the addition of AudioGenie further strengthens the competitiveness of Chinese AI companies. OpenRouter data shows that Qwen3 usage increased by 15.4%, while Claude and Gemini decreased by 18.9% and 6.8%, respectively. AudioGenie, with its multimodal capabilities and cost-effectiveness, is expected to further squeeze the market share of international giants.

Future Outlook: Opening a New Era of Audio Creation  

The launch of AudioGenie marks a new height in AI audio generation technology. Its multimodal input, no-training requirement, and self-correction features provide creators with unprecedented flexibility and efficiency. Industry insiders predict that AudioGenie will see widespread application in fields such as media production, game development, and accessibility tools, helping Chinese AI technology shine on the global stage. AIbase will continue to follow the latest developments of AudioGenie and bring you the latest industry news.

Summary  

Tencent AudioGenie, with its powerful multimodal audio generation capabilities and innovative training-free framework, is redefining the standards of AI audio generation. In the face of competition from international giants, AudioGenie demonstrates the solid strength of Chinese AI technology. AIbase will continue to track the latest developments in this field and reveal how AI is changing the future of creation!

Project Address: https://audiogenie.github.io/