The Doubao App now has a voice conversation feature that supports Cantonese, Sichuan dialect, Northeastern dialect, and Shaanxi dialect. Users can use text or voice commands to communicate in dialects with the Sweet Peach voice tone. This feature is based on dialect transfer technology, enabling smooth switching between multiple dialects using a single voice tone, and it has intelligent thinking capabilities to provide natural responses based on context.
OpenAI integrates ChatGPT voice mode into the main interface, enabling direct voice conversations with real-time visual support like maps and images, plus automatic transcription for easy review.....
Google introduces Gemini voice assistant on Google TV, replacing Google Assistant. This upgrade enables natural conversation for content access and complex cross-context queries like personalized movie recommendations.....
Google has launched the StreetReaderAI prototype system, helping blind and low-vision users to independently explore Google Street View through natural language interaction. The system integrates computer vision, geographic information systems, and large language models, enabling a multimodal AI-driven real-time conversational street view experience, breaking through the limitations of traditional voice announcements and enhancing the freedom of accessible urban exploration.
Intelligent AI voice agent, enabling natural conversations and supporting multiple languages, used for business call automation.
Step-Audio is an open-source intelligent voice interaction framework that supports multilingual conversation, emotional intonation, and voice cloning.
An AI chatbot project based on ESP32 that supports multilingual conversations and voiceprint recognition.
A new foundational voice-to-voice model that delivers a human-like conversation experience.
Marvis-AI
Marvis is an advanced conversational voice model designed for real-time streaming text-to-speech synthesis. It focuses on efficiency and ease of use, supporting high-quality real-time voice synthesis on consumer devices such as Apple chips, iPhones, iPads, and Macs.
webbigdata
VoiceCore is a commercially available Japanese voice AI agent model that focuses on enabling AI to have natural conversations with humans through voice. It has the ability to express emotions and non-verbal sounds and supports multiple voice style selections.
thomasgauthier
Hugging Face implementation of Sesame Technology's Conversational Speech Model (CSM), supporting text-to-speech and voice cloning tasks
gpt-omni
Mini-Omni2 is a fully interactive multimodal model capable of understanding image, audio, and text inputs, and engaging in end-to-end voice conversations with users.
An AI voice call system based on the MCP protocol that uses VoIP technology to enable AI assistants like Claude to automatically make calls and conduct intelligent conversations. It supports multiple SIP protocols and audio codecs.
Voice Mode is a tool that provides natural voice conversation capabilities for AI assistants and supports human - machine voice interaction with LLMs such as Claude and ChatGPT through the MCP protocol.
An MCP server that supports voice interaction with LLMs such as Claude. You only need an OpenAI API key and a microphone/speaker to have a real-time voice conversation.
An intelligent conversational robot project based on large models, supporting multi - platform access and multiple AI models, with text, voice, image processing, and plugin expansion capabilities, and can customize enterprise AI applications.
The Sinch MCP Server is a developer preview toolset that provides multi-functional services for interacting with the Sinch API, including conversation, verification, voice, and email tools, and supports use through MCP clients such as Claude Desktop.
A voice conversation server that connects Claude AI and ElevenLabs