VideoReTalking
Audio-driven video editing for high-quality lip-sync synchronization.
CommonProductVideoAudio-drivenLip-sync
VideoReTalking is a novel system that can edit real-world talking head videos to produce high-quality lip-sync output videos based on input audio, even with varying emotions. The system breaks down this goal into three consecutive tasks: (1) Generating facial videos with normalized expressions using an expression editing network; (2) Audio-driven lip-sync synchronization; (3) Facial enhancement to improve photorealism. Given a talking head video, we first use an expression editing network to modify the expressions of each frame according to a standardized expression template, resulting in a video with normalized expressions. This video is then input into a lip-sync network along with the given audio to generate a lip-sync video. Finally, we use an identity-aware facial enhancement network and post-processing to enhance the photorealism of the synthesized face. We utilize learning-based methods for all three steps, and all modules can be processed sequentially in a pipeline without any user intervention.
VideoReTalking Visit Over Time
Monthly Visits
No Data
Bounce Rate
No Data
Page per Visit
No Data
Visit Duration
No Data
VideoReTalking Visit Trend
No Visits Data
VideoReTalking Visit Geography
No Geography Data
VideoReTalking Traffic Sources
No Traffic Sources Data