Competition in the AI voice generation field is heating up. Recently, two undergraduate students from South Korea partnered to create Dia, an AI voice model that claims to rival Google's NotebookLM. Despite their limited experience in AI, they developed this open-source voice generation tool in just three months.
Dia's training leveraged Google's TPU Research Cloud, providing free access to TPU AI chips. Boasting 160 million parameters, the model generates dialogue based on provided scripts. Users can customize speaker tone and insert non-verbal cues like coughs and laughter. More parameters generally indicate better model performance.
Currently accessible via Hugging Face and GitHub, Dia is compatible with most modern PCs with at least 10GB of VRAM. Without specific style instructions, Dia generates random voices, but it also offers voice cloning capabilities.
In initial TechCrunch testing, Dia performed well, seamlessly generating two-way conversations on various topics with voice quality comparable to other tools on the market. Notably, its voice cloning feature was among the easiest to use tested by reporters.
However, Dia's lack of safeguards raises concerns. Users could easily misuse it to create deepfakes or fraudulent recordings. While Nari, on Dia's project page, urges users not to employ the model for fraud or illegal activities, they also disclaim responsibility for misuse. Furthermore, Nari hasn't disclosed the data sources used to train Dia, raising potential copyright issues.
Toby Kim, founder of Nari Labs, stated their plan to build a "socially-featured" synthetic voice platform based on Dia and hopes to support more languages in the future. Nari also plans to release a technical report on Dia to further expand its reach.
Project: https://github.com/nari-labs/dia
Key Highlights:
🌟 Dia, an AI voice model created by two undergraduate students, generates dialogues and supports voice cloning.
🚀 Trained using Google's TPU Research Cloud, Dia boasts 160 million parameters and is compatible with modern PCs.
⚠️ The model presents security risks; Nari disclaims responsibility for misuse and hasn't disclosed the training data sources.