Giant Network Launches Three Multi-Modal Models: Eliminating Video Distortion and Enabling Practical Song Usage Through Voice Conversion
Giant Network AI Lab, in collaboration with Tsinghua University and North-western Polytechnical University, has launched three audio-visual multi-modal generation technologies: YingVideo-MV (music-driven video generation), YingMusic-SVC (zero-shot voice conversion), and YingMusic-Singer (voice synthesis). These technologies will be open-sourced, with YingVideo-MV capable of generating videos using only music and a person's image.