KlingAI Avatar 2.0 Launches and Immediately Becomes a Hit: Singing and Dancing in 5 Minutes with One Click, Digital Humans Officially Bid Farewell to the Stiff Expression Era

AIbase基地

Published inAI News · 7 min read · Dec 5, 2025

Revolutionary breakthrough quietly emerges in the field of AI video generation. Kuaishou's KlingAI recently launched a new digital human model, Avatar2.0. With just one person's photo and a music audio clip, users can instantly generate a 5-minute singing video. Digital humans are no longer stiff "lip-sync" puppets, but rather "performers" who naturally raise their eyebrows, smile with their eyes, and move their bodies to the rhythm. This upgraded model has been officially launched on the Kling platform, marking a leap from "static" to "dynamic storytelling" in AI content creation.

Core Innovation: Intelligent Leap from Audio to Emotional Performance

The core of Avatar2.0 lies in its multimodal director module (MLLM Director), which integrates multimodal large language models (MLLMs) to convert three user inputs—image, audio, and text prompts—into a coherent storyline. Specifically, the system first extracts speech content and emotional trajectories from the audio, such as injecting "excitement" during upbeat melodies or synchronizing with drum beats during rap sections. At the same time, it identifies facial features and scene elements from a single photo and incorporates user text like "slow zoom up" or "arms moving rhythmically." Finally, by injecting text across attention layers into the video diffusion model, it generates a globally consistent "blueprint video," ensuring smooth rhythm and unified style throughout the entire content.

Compared to previous versions, Avatar2.0 has made a qualitative leap in expression control: emotions like smiling, anger, confusion, and emphasis appear naturally, avoiding the "facial paralysis" of early AI characters. The motion design is also more flexible, not limited to head lip-sync, but including full-body performances such as shoulder shrugs and gesture emphases that perfectly align with the music. Test benchmarks show that in 375 sample cases of "reference image–audio–text prompt," the model achieves a response accuracy of over 90% in complex singing scenarios, supporting real people, AI-generated images, and even animal or cartoon characters.

Technical Support: High-quality Data and Two-stage Generation Framework

To achieve stable output of minute-long long videos, the Kuaishou Kling team built a rigorous training system. They collected thousands of hours of video from speech, dialogue, and singing corpora, using expert models to screen them based on multiple dimensions such as mouth clarity, audio-visual synchronization, and aesthetic quality, finally obtaining hundreds of hours of high-quality dataset after manual review. The generation framework adopts a two-stage design: the first stage plans the global semantics based on the blueprint video; the second stage extracts the first and last frames as conditions and generates sub-segment videos in parallel, ensuring identity consistency and dynamic coherence.

Additionally, Avatar2.0 supports ultra-high frame rates of 48fps and 1080p HD output, with animation smoothness far exceeding industry averages. Users can try the basic functions for free on the Kling platform (https://app.klingai.com/cn/ai-human/image/new), while advanced long videos require a subscription plan. Platform data shows that the number of generated videos increased by 300% on the first day of launch, with user feedback focusing on "emotional authenticity" and "easy operation."

Application Prospects: Reshaping Short Video and Marketing Ecosystem

This model's implementation will deeply impact fields such as short videos, e-commerce advertising, and educational content. For example, podcast creators can transform pure audio into visual performance, instantly boosting the appeal on YouTube or Douyin; e-commerce sellers only need to upload product photos and audio explanations to generate multilingual demonstration videos, reducing costs to 1/10 of traditional shooting. Music enthusiasts can experiment with "virtual concerts": input melodies generated by Suno AI, and Avatar2.0 can make the digital human "sing" an emotionally engaging MV, even supporting multi-person interactive scenes.

In the global AI wave, KlingAI Avatar2.0 is not only a technological iteration but also a catalyst for creative democratization. It allows ordinary users to "direct" professional-level videos without barriers, foreshadowing a future where content production shifts from "labor-intensive" to "AI-powered." However, experts also remind that along with convenience come copyright and ethical challenges, such as compliance in using celebrities' faces.

ChatGPT Agent User Churn 75% Ambiguous Positioning Becomes a Fatal Flaw

OpenAI's ChatGPT Agent, launched just six months ago, is facing discontinuation due to a sharp 75% drop in weekly active paid users, from 4 million to under 1 million. Despite initial interest from 11% of subscribers, growth stalled as users struggled to understand its purpose and encountered system issues.....

AI Daily: Unitree Opensources UnifoLM-VLA-0 Large Model; Tencent Yuanbao's Internal Test Screenshots Leaked; Clawd Rebrands to OpenClaw

Welcome to the [AI Daily] section! Here is your guide to exploring the world of artificial intelligence every day. Every day, we present you with the latest content in the AI field, focusing on developers, helping you understand technological trends and learn about innovative AI product applications. Click to learn more about new AI products: https://app.aibase.com/zh1. Ant Group launched LingBot-VLA: The dual-arm robot control has entered the era of large models. Ant Group released a visual-language-action foundation model called LingBot-VLA.

The Intelligent Evolution of the Construction Industry: AI Market Size to Reach $32 Billion by 2033

Construction is shifting from digital laggard to AI hotspot, moving from experience-driven to data-driven to tackle cost overruns, delays, and labor shortages. The global AI market in construction is projected to surge from $6.2B in 2026 to $32B by 2033, with a 26.4% CAGR, signaling explosive growth.....

World Models Enter the Physical World: Antlingbot Opens Source LingBot-VA, Allowing Robots to Think Before Acting

Ant Group's LingBot-VA, an open-source embodied world model, introduces an autoregressive video-action framework that integrates video generation with robotic control. It simultaneously predicts future states and outputs action sequences, enabling real-time planning and execution. Real-world tests confirm its effectiveness in complex physical interaction tasks.....

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Brand Visibility

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

KlingAI Avatar 2.0 Launches and Immediately Becomes a Hit: Singing and Dancing in 5 Minutes with One Click, Digital Humans Officially Bid Farewell to the Stiff Expression Era

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Drop the Keyboard: Genspark Launches Workspace 2.0 to Open a New Era of Voice-Based Work

ChatGPT Agent User Churn 75% Ambiguous Positioning Becomes a Fatal Flaw

AI Daily: Unitree Opensources UnifoLM-VLA-0 Large Model; Tencent Yuanbao's Internal Test Screenshots Leaked; Clawd Rebrands to OpenClaw

Another Name Change! Clawd Renames to OpenClaw GitHub Star Count Exceeds 100,000 and Causes a Fuss in the Community

The Intelligent Evolution of the Construction Industry: AI Market Size to Reach $32 Billion by 2033

SenseNova-MARS by SenseTime Open Source: Agentic VLM Empowers AI with Independent Thinking and Action Capabilities

Ant Group Launches LingBot-VLA: Dual-Arm Robot Control Enters the Era of Large Models

World Models Enter the Physical World: Antlingbot Opens Source LingBot-VA, Allowing Robots to Think Before Acting

Top 10 Technology Trends of 2025 Revealed: Smart Agents and Embodied Intelligence Lead the Frontier

Giant Collaboration: NVIDIA, Amazon, and Microsoft Plan to Invest $60 Billion in OpenAI

AI News Recommendations

Drop the Keyboard: Genspark Launches Workspace 2.0 to Open a New Era of Voice-Based Work

ChatGPT Agent User Churn 75% Ambiguous Positioning Becomes a Fatal Flaw

AI Daily: Unitree Opensources UnifoLM-VLA-0 Large Model; Tencent Yuanbao's Internal Test Screenshots Leaked; Clawd Rebrands to OpenClaw

Another Name Change! Clawd Renames to OpenClaw GitHub Star Count Exceeds 100,000 and Causes a Fuss in the Community

The Intelligent Evolution of the Construction Industry: AI Market Size to Reach $32 Billion by 2033

SenseNova-MARS by SenseTime Open Source: Agentic VLM Empowers AI with Independent Thinking and Action Capabilities

Ant Group Launches LingBot-VLA: Dual-Arm Robot Control Enters the Era of Large Models

World Models Enter the Physical World: Antlingbot Opens Source LingBot-VA, Allowing Robots to Think Before Acting

Top 10 Technology Trends of 2025 Revealed: Smart Agents and Embodied Intelligence Lead the Frontier

Giant Collaboration: NVIDIA, Amazon, and Microsoft Plan to Invest $60 Billion in OpenAI

GEO Services