On June 5, at the 2026 AI Industry Application Conference, Tencent Cloud's audio and video services officially launched the AI-native capability foundation WAND. Relying on over 20 years of technical accumulation, Tencent Cloud's audio and video services have comprehensively upgraded from the underlying model, media capabilities to the access method. The AI capabilities for audio and video media will be opened to the industry in an Agent-Native mode, achieving a strategic upgrade from providing single-point media processing capabilities to a native media base for AI applications and Agents.

image.png

The WAND architecture consists of three layers: model engine, capability layer, and scenario solutions, including six self-developed media-specific models for encoding/decoding, enhancement, erasure, generation, understanding, and audio, supplementing the shortcomings of mainstream generative large models in the media production process. The capability layer reorganizes more than 60 media AI capabilities into categories of generation, understanding, processing, and encoding, and opens them through three modes: API, pre-arranged agent workflows (Agentic Workflow), and Skills, supporting end-to-end automatic execution of the entire workflow on the Agent side without switching tools

image.png

WAND Capability Architecture Diagram

In real business scenarios, WAND demonstrates high adaptability and efficiency advantages. In e-commerce applications, WAND's generation model can customize processing strategies for different product categories, effectively reducing error rates and improving image usability. In short animation drama creation, WAND connects script generation, character consistency maintenance, and other steps into an automated workflow, increasing average production efficiency by 90%, serving more than 80% of the top animation drama platforms domestically, and its AI enhancement and seamless erasure technology has jointly won the NAB Show 2026 Product of the Year Award.

Additionally, facing high concurrency and extremely low latency requirements in sports live streaming scenarios, WAND integrates identification, generation, synthesis, and encoding into a fully automated process through self-developed model collaboration scheduling, saving more than 50% of the bitrate compared to traditional solutions, and has supported thousands of global top-tier events so far

As the leader maintaining the first market share in China and overseas for 11 consecutive times, Tencent Cloud's audio and video services are accelerating the audio and video capabilities to become production-level tools that can be uniformly scheduled by Agents, fully empowering innovation in audiovisual applications in the AI Agent era