Goodbye Voice Actors? ByteDance's PersonaTalk Achieves Accurate Voiceover with Perfect Expression Details!
Recently, ByteDance developed an AI model called PersonaTalk, which can provide precise voiceovers for videos while perfectly synchronizing lip movements and speaking styles. PersonaTalk is a two-stage framework based on attention mechanisms, comprising geometric structure and facial rendering components. In the first stage, it uses a mixed geometry estimation method to extract the facial geometric coefficients of the speaker from a reference video. It then extracts and encodes audio features from the target audio and learns personalized speaking styles from geometric statistical features.