Dia: A Revolutionary Open-Source TTS Model with Emotion and Non-Verbal Cues

AIbase基地

Published inAI News · 4 min read · Apr 23, 2025

179

A two-person startup called Nari Labs has released Dia, a 1.6-billion parameter text-to-speech (TTS) model designed to generate natural-sounding conversations directly from text prompts. Co-founder Toby Kim claims Dia outperforms proprietary offerings from competitors like ElevenLabs, Google's NotebookLM AI podcast generation feature, and potentially even OpenAI's recently released gpt-4o-mini-tts.

Kim stated on X (formerly Twitter) that Dia's quality rivals NotebookLM's podcast functionality and surpasses ElevenLabs Studio and Sesame's open models. He revealed the model was built with "zero funding" and emphasized that they weren't AI experts initially, launching the project out of a love for NotebookLM's podcast feature. They tried all available TTS APIs on the market, finding none sufficiently natural. Kim expressed gratitude to Google for allowing them to use its Tensor Processing Unit (TPU) chips to train Dia.

Currently, Dia's code and weights are open-sourced on Hugging Face and GitHub for users to download and deploy locally. Individual users can also experience it online via Hugging Face Space.

Voice Control

Advanced Controls and Enhanced Customization

Dia supports nuanced features including emotional intonation, speaker labels, and non-verbal audio cues like (laugh), (cough), (clear throat), all achieved through pure text. Nari Labs' examples demonstrate Dia's ability to correctly interpret these labels, a feature often unreliable in other models. The model currently only supports English, and the voice varies on each run unless the user modifies the generation seed or provides audio prompts for voice cloning.

Nari Labs provides comparison examples on its website showcasing Dia's superiority over ElevenLabs Studio and Sesame CSM-1B in handling natural rhythm, non-verbal expressions, multi-emotional dialogues, complex rhythmic content, and maintaining voice style through audio prompts. Nari Labs notes that Sesame's demo might have used an internally larger parameter version.

Model Access and Technical Specifications

Developers can obtain Dia from Nari Labs' GitHub repository and Hugging Face model page. The model runs on PyTorch 2.0+ and CUDA 12.6, requiring approximately 10GB of VRAM. Nari Labs plans to offer CPU support and quantized versions in the future.

Dia is distributed under the fully open-source Apache 2.0 license, permitting commercial use. Nari Labs emphasizes a prohibition against unethical use and encourages responsible experimentation. The project's development was supported by Google TPU Research Cloud, Hugging Face's ZeroGPU grant program, and other relevant research. Despite being a team of only two engineers, Nari Labs actively invites community contributions.

Text-to-Speech (TTS) Model Dia NariLabs 1.6 Billion Parameters

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

AI Daily: Kimi K2 High-Speed Version Released; Meitu WHEE Launches Video Ultra HD Feature; ByteDance Releases New Model Seed Diffusion Preview

AI updates: Meitu's WHEE enhances video clarity; Kimi K2 speeds up to 40 Tokens/s; Alibaba's Qwen3-Coder-Flash supports 256K context; Anthropic leads with 32% market share; ByteDance's Seed boosts code generation; Musk plans AI video & virtual partner for Grok; Poe offers API for 100+ models; FLUX.1-Krea improves image aesthetics; Auggie CLI aids devs; MOSS-TTSD enables long speech; Claude adds file uploads.....

Aug 1, 2025

Xiaomi Browser Upgrades AI Features, Integrates Douba Large Model

Xiaomi Browser upgraded with 'AI Search', integrating Doubao model and Volcano Ark agent, adding AI Q&A, translation, and problem-solving. Xiaomi App Store now supports DIY AI apps via Volcano Engine. The upgrade highlights Xiaomi's AI advancements and partnership with Volcano Engine.....

Aug 1, 2025

130

Google DeepMind Launches Virtual Satellite AI Model AlphaEarth Foundations to Reshape Global Environmental Monitoring

Google DeepMind's AlphaEarth Foundations AI integrates multi-source satellite data via 64D embeddings, creating 10x10m Earth models. It processes 3B observations, reducing environmental monitoring errors by 24%, and enables cloud-penetrating analysis. Google will release 1.4T embedding footprints and research grants.....

Aug 1, 2025

100

Bid Farewell to AI-like Aesthetics! Black Forest Labs Collaborates with Krea to Open-source FLUX.1-Krea Model Redefining Natural Aesthetics in Image Generation

Black Forest Labs & Krea launch open-source image model FLUX.1-Krea, addressing AI's artificial look via 12B-parameter diffusion. Enhanced training improves realism in lighting/colors, compatible with FLUX ecosystem for commercial use.....

Aug 1, 2025

ByteDance Launches Experimental Diffusion Language Model Seed Diffusion Preview

The Seed team of ByteDance announced the release of the experimental diffusion language model Seed Diffusion Preview, marking a major technological breakthrough in the field of language models. The model aims to validate the feasibility of the discrete diffusion technology route as a foundational framework for next-generation language models through structured code generation experiments. Seed Diffusion Preview has achieved significant improvements in inference speed, reaching 2146 tokens per second, which is 5 times faster than equivalent-scale autoregressive models.

Aug 1, 2025

Amazon CEO Wants to Embed Ads in Alexa + Conversations to Open a New Business Model!

Amazon CEO Andy Jassy plans to integrate ads into AI assistant Alexa+, recommending products via dialogue. Despite user enthusiasm for Alexa+'s upgrade, AI ad chats are novel. Google and OpenAI are also exploring AI ads. Amazon has invested $31.4B in AI, focusing on chips and data centers, while tackling challenges like AI hallucinations and privacy.....

Aug 1, 2025

Google Launches Virtual Satellite AI Model AlphaEarth, Reducing Earth Observation Costs to 1/16

Google released the AI model AlphaEarth Foundations, revolutionizing Earth observation through virtual satellite technology. The model integrates multi-source data such as satellites and radar daily, dividing the Earth's surface into 10-meter grids for long-term tracking, and uses color coding to intuitively present properties like vegetation and land surfaces. Its innovative compression technology reduces storage requirements to 1/16, significantly lowering costs. It has been applied in areas such as crop monitoring and deforestation tracking, with excellent test performance. The dataset is currently available on Google Earth Engine

Aug 1, 2025

150

Figma's market value surged to 47 billion dollars on its debut day, with stock price once triggering a trading halt

Figma listed on the New York Stock Exchange on August 1st, with an IPO pricing of 33 dollars. Its stock price fluctuated dramatically on the first day, closing at 115.5 dollars, giving the company a market value of 47 billion dollars, far exceeding Adobe's 20 billion dollar acquisition offer from last year. The market reaction was enthusiastic, with trading halted due to volatility, and investors sharing their orders on social media platforms, sparking widespread discussion. This successful listing highlights Figma's product appeal and the capital market's optimism about the future of design SaaS platforms.

Aug 1, 2025

Volvo New XC70 Launched: Intelligent In-car System Combined with AI Large Model, Official Pre-sale Begins in August!

Volvo's new XC70 starts pre-sale in August, featuring: 1. Smart voice assistant with 4-zone recognition & AI Q&A; 2. Four 'Comfort Rest Mode' themes; 3. Seven colors, 'Thor's Hammer' headlights, hidden door handles; 4. 1.5T plug-in hybrid, 4815x1890x1650mm, 2895mm wheelbase. Focuses on smart tech & comfort. (240 chars)....

Aug 1, 2025

Qwen3-Coder-Flash Open Source Programming Model

The Qwen3-Coder series has welcomed a new member - Qwen3-Coder-Flash, a programming model affectionately called "a dessert-level" by developers. With its outstanding performance and efficient operation speed, this model brings new surprises to the field of programming. The full name of Qwen3-Coder-Flash is Qwen3-Coder-30B-A3B-Instruct. It performs excellently in terms of performance and efficiency, especially in agentic coding and browsing

Aug 1, 2025

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Dia: A Revolutionary Open-Source TTS Model with Emotion and Non-Verbal Cues

AIbase基地

Advanced Controls and Enhanced Customization

Model Access and Technical Specifications

This article is from AIbase Daily

AI News Recommendations

AI Daily: Kimi K2 High-Speed Version Released; Meitu WHEE Launches Video Ultra HD Feature; ByteDance Releases New Model Seed Diffusion Preview

Xiaomi Browser Upgrades AI Features, Integrates Douba Large Model

Google DeepMind Launches Virtual Satellite AI Model AlphaEarth Foundations to Reshape Global Environmental Monitoring

Bid Farewell to AI-like Aesthetics! Black Forest Labs Collaborates with Krea to Open-source FLUX.1-Krea Model Redefining Natural Aesthetics in Image Generation

ByteDance Launches Experimental Diffusion Language Model Seed Diffusion Preview

Amazon CEO Wants to Embed Ads in Alexa + Conversations to Open a New Business Model!

Google Launches Virtual Satellite AI Model AlphaEarth, Reducing Earth Observation Costs to 1/16

Figma's market value surged to 47 billion dollars on its debut day, with stock price once triggering a trading halt

Volvo New XC70 Launched: Intelligent In-car System Combined with AI Large Model, Official Pre-sale Begins in August!

Qwen3-Coder-Flash Open Source Programming Model