On July 22, Tencent Huan Yuan announced that its self-developed ASR (Automatic Speech Recognition) large model has been officially applied to the ima platform, achieving voice input functionality on mobile app for the first time, providing users with a convenient "speaking fluently" experience. Users can now directly input questions or record ideas through voice without manually typing on the keyboard, greatly improving input efficiency.
The Tencent Huan Yuan ASR large model stands out with its keen hearing and intelligent understanding capabilities, accurately recognizing speech even in complex environments. It can recognize 300 words per minute, four times faster than manual input, and the recognition results are more accurate and natural. The model adopts the industry's first streaming ASR architecture based on dual encoders. Compared with traditional ASR technology, it has significantly improved semantic understanding capabilities, especially performing better in scenarios such as mixed Chinese and English.
The voice input function introduced on the ima platform covers multiple application scenarios, including knowledge base Q&A and note creation. When users perform knowledge base queries or homepage Q&A, if the question is long, they can directly input through voice; when writing notes, ima acts like a note assistant that listens and helps users create content, and can quickly continue writing based on old notes, achieving seamless integration. In addition, iOS users can also add a desktop widget to enjoy a faster question-answering experience.
The Tencent Huan Yuan team stated that in the future, they will continue to optimize the ASR large model, enhance the ability to recognize dialects and multiple languages, continuously expand the supported language types, and meet the needs of different scenarios. The launch of this voice input function not only demonstrates the technical strength of Tencent Huan Yuan in the field of speech recognition, but also brings users a more efficient and convenient input method, opening a new chapter in intelligent interaction.