Tencent AI Lab has officially released and open-sourced a music generation large model named SongGeneration. This model aims to address three major challenges commonly found in the field of music generation: audio quality, musicality, and generation speed. SongGeneration adopts a fusion architecture based on large models, significantly improving the audio quality of music generation while maintaining fast generation speeds, even surpassing the performance of some commercial closed-source models in certain aspects.

SongGeneration not only achieves breakthroughs in audio quality and generation speed but also offers multiple functions such as text control, multi-track synthesis, and style following, greatly enhancing user creativity experiences. Users can input keywords to generate complete music works that match specific styles and emotions. Additionally, users can upload reference audio files, and SongGeneration will create new tracks with consistent styles across various genres, including pop and rock.

image.png

From a technical perspective, SongGeneration builds a comprehensive data pipeline, including modules such as sound separation, structural analysis, and lyric recognition, which efficiently processes audio data. Its training model has a total parameter count of approximately 3 billion, pre-trained on an extensive collection of Chinese and English songs, ensuring its powerful generative capabilities.

The release of SongGeneration not only represents advancements in music generation technology but also actively responds to the vision of "everyone can create" in the future of music. It provides powerful tools for content creators, game developers, and musicians, building an open and flexible music AI ecosystem that allows more people to easily engage in music creation.

Experience SongGeneration at this address: https://huggingface.co/spaces/tencent/SongGeneration

Key points:

🎵 SongGeneration is an open-source music generation large model launched by Tencent AI Lab, aiming to improve audio quality, musicality, and generation speed.

🎤 Users can easily generate new music that matches their style by inputting keywords or uploading audio files, experiencing an intuitive and highly controllable creative process.

🎶 The model is built on a large architecture with 3B parameters, pre-trained on a vast collection of Chinese and English songs, advancing the intelligentization of music creation.