TEN Agent Open Source TEN VAD and Turn Detection Enable Ultra-Low Latency for Speech AI

AIbase基地

Published inAI News · 7 min read · Jul 1, 2025

The TEN Agent team recently announced the official open-source release of its core models **TEN Voice Activity Detection (VAD)** and **TEN Turn Detection**, providing strong technical support for building real-time, multimodal speech AI agents.

This move marks a significant advancement in the TEN framework's efforts to promote the democratization of speech interaction technology and open-source collaboration. Below is the latest update compiled by AIbase, offering an in-depth analysis of the functions, advantages, and potential impact of these two core models on the industry.

TEN VAD: Low-latency, High-performance Voice Activity Detection

TEN VAD is a real-time voice activity detector designed for enterprise-level applications, known for its low latency, lightweight design, and high performance. According to official information and social media feedback, TEN VAD can detect voice activity at the frame level with remarkable precision, significantly outperforming commonly used industry models such as WebRTC VAD and Silero VAD. Here are its key highlights:

- **Low computational complexity**: The TEN VAD library is small in size and has low computational complexity, supporting cross-platform C language compatibility, covering multiple operating systems such as Linux x64, Windows, macOS, Android, and iOS. It also provides Python bindings for Linux x64 and WASM support for the web.[](https://huggingface.co/TEN-framework/ten-vad)

- **High accuracy and low latency**: Compared to Silero VAD, TEN VAD has lower latency in detecting transitions from speech to non-speech, allowing it to quickly identify short pauses, making it suitable for real-time interactive scenarios. Tests show that its real-time factor (RTF) performs excellently across various CPU platforms.[](https://huggingface.co/TEN-framework/ten-vad)

- **Latest open-source progress**: In June 2025, the TEN team open-sourced the ONNX model and preprocessing code, enabling deployment on any platform and hardware architecture that supports ONNX, further enhancing flexibility. Additionally, the support for WASM+JS expands its application possibilities on the web.

On social media, developers have highly recognized the open-source release of TEN VAD, believing that its performance surpasses traditional VAD models, providing a powerful tool for real-time voice assistant development.

TEN Turn Detection: Intelligent Dialogue Turn Management

**TEN Turn Detection** is an intelligent turn detection model designed for full-duplex voice communication, aiming to solve one of the most challenging issues in human-computer dialogue: accurately determining when a user ends their speech and performing context-aware interruption handling. Here are its key features:

- **Semantic analysis capabilities**: Based on the Qwen2.5-7B Transformer model, TEN Turn Detection precisely distinguishes between "completed," "waiting," and "unfinished" states of user speech by analyzing the semantic context and language patterns of the conversation. For example, it can recognize "Hey, I want to ask a question..." as an unfinished statement, thus avoiding unnecessary AI interruptions.[](https://huggingface.co/TEN-framework/TEN_Turn_Detection)

- **Multilingual support**: Currently supports English and Chinese, accurately identifying turn signals in multilingual conversations, suitable for global application scenarios.[](https://huggingface.co/TEN-framework/TEN_Turn_Detection)

- **Excellent performance**: On public test datasets, TEN Turn Detection outperforms other open-source turn detection models in all metrics, especially excelling in dynamic real-time conversations.[](https://huggingface.co/TEN-framework/TEN_Turn_Detection)

- **Natural interaction experience**: Combined with TEN VAD, TEN Turn Detection enables AI agents to wait for appropriate speaking opportunities or handle user interruptions in the right context, creating a more natural conversational experience.[](https://www.agora.io/en/blog/making-voice-ai-agents-more-human-with-ten-vad-and-turn-detection/)

TEN Agent Ecosystem: The Foundation of Multimodal Real-time AI

TEN Agent is a showcase project of the TEN framework, integrating core components such as TEN VAD and TEN Turn Detection, supporting multimodal real-time interactions including voice, video, and text. Here are its roles within the ecosystem:

- **Seamless integration**: As plugins of the TEN framework, TEN VAD and TEN Turn Detection allow developers to easily integrate them into the voice agent development process through simple configuration, supporting integration with services like Deepgram and ElevenLabs.

- **Multi-scenario applications**: TEN Agent supports a wide range of use cases, from intelligent customer service and real-time translation to virtual companions. For example, combined with the Google Gemini multimodal API, TEN Agent can enable real-time visual and screen-sharing detection, expanding its applications in fields such as education and healthcare.

- **Open-source collaboration**: All components of the TEN framework (except part of the TEN VAD code) are fully open-sourced, encouraging community developers to contribute code, fix bugs, or suggest new features. The TEN team provides collaboration channels via GitHub Issues and Projects, attracting widespread developer participation

Project: https://github.com/TEN-framework/ten-framework

Grammarly Makes a Big Upgrade: Launches Document Interface and Multiple AI Tools to Balance Writing and Detection

[AIbase Report] The popular writing tool Grammarly has recently launched a new document interface built on its acquisition of the productivity startup Coda from last year. This update not only brings a fresh user experience but also integrates multiple powerful AI assistants and tools, aiming to provide more comprehensive writing support for students and professionals. The new interface adopts a block-first design philosophy, allowing users to easily insert tables, columns, dividers, lists, and headings, and use rich text blocks to highlight key information

Gemini API Makes a Major Upgrade! The URL Context Feature is Now Live, Bringing a New Model for Monetizing Website Content!

Recently, Gemini officially launched the URL Context feature, an innovative tool that allows developers to directly embed web links in APIs, greatly simplifying the content retrieval process and opening up new business opportunities for content providers and developers. The AIbase editorial team provides an in-depth analysis of the highlights and potential impact of this feature. URL Context Feature: Eliminate Complex Scripts, Make Content Retrieval More Efficient. The URL Context feature of Gemini API allows developers

UK Launches AI Crime Mapping Initiative to Identify High-Risk Areas by 2030

AIbase report: According to neowin, the UK Department for Science, Innovation and Technology recently announced the launch of the 'Crime Data Aggregation Challenge', aiming to create an AI-driven real-time crime map for England and Wales by 2030. The map is designed to predict high-risk events such as knife crimes and anti-social behavior, helping the police deploy resources in advance and take proactive measures. The project is part of the UK's £500 million R&D Accelerator program, with an initial investment of £4 million, and a prototype is expected to be ready by April 2026.

GitHub CEO Thomas Dohmke is Set to Step Down, Microsoft's AI Division Faces Intense Competition

GitHub CEO Thomas Dohmke has announced his departure at the end of the year, returning to his role as a founder. He joined Microsoft when HockeyApp was acquired in 2015 and became CEO of GitHub in 2021. GitHub currently has over 150 million registered developers, and Copilot has reached 20 million users, but it faces intense competition from AI coding tools like Cursor. Microsoft recently established the CoreAI division to integrate GitHub operations, and Dohmke stated that the team will continue to operate within this division. His departure comes at a critical time for GitHub.

China's First Vertical Large Model for Tunnels and Underground Space Released

China Railway Tunnel Group released the first vertical AI model for tunnels and underground spaces, built on 773 projects and 120B data points. It features 'Tunnel Hero AI Assistant' and smart decision systems, enhancing design and construction efficiency, validated in major projects like plateau railways.....

OpenAI Concedes After Strong User Protests: Restore Old Models and Significantly Increase Subscription User Quotas

After user protests, OpenAI adjusted its strategy: restoring old models such as GPT-4o and significantly increasing the usage limits for paying users. CEO Altman revealed that free user usage increased from 1% to 7%, while paid user usage surged from 7% to 24%. Faced with a surge in demand, OpenAI admitted it must raise the usage cap. This adjustment shows that industry leaders need to take user feedback seriously and also reflects the computational pressure brought by AI's widespread adoption. (140 words)

Zhiyuan Robotics Secures Millions in Orders, the Yuanzheng A2-W Embodied Robot Achieves Domestic Mass Commercialization for the First Time

Zhiyuan Robotics announced a cooperation worth millions of yuan with Fulinjinggong, with nearly 100 Yuanzheng A2-W wheeled general-purpose robots to be deployed in factories, marking the first large-scale commercial deployment of industrial embodied robots in China. This is the first large-scale application of this category in global intelligent manufacturing scenarios, capable of completing tasks such as de-palletizing and material handling, signifying a new phase in the transition of embodied robots from research and development to practical applications.

OpenAI GPT-5 Officially Launches on Cline, Demonstrating Advanced AI Capabilities

OpenAI's GPT-5, the latest flagship model on Cline, excels in reasoning, coding, and user experience, outperforming Claude4Sonnet. It offers multimodal capabilities with three versions (flagship, lightweight, low-latency) for diverse applications. Despite $500M R&D costs, it delivers efficiency with lower error rates.....

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief