LongCat-Flash-Omni Officially Released, Opening a New Era of Multimodal Real-Time Interaction

AIbase基地

Published inAI News · 4 min read · Nov 3, 2025

On September 1st, Meituan officially launched the LongCat-Flash series model and recently open-sourced two versions, LongCat-Flash-Chat and LongCat-Flash-Thinking, which have attracted a lot of developers' attention. Today, the LongCat team announced the release of a new family member - LongCat-Flash-Omni. This model has achieved multiple technological innovations based on the original foundation, marking a new era of full-modal real-time interaction.

LongCat-Flash-Omni is based on the efficient architecture design of the LongCat-Flash series and adopts the latest Shortcut-Connected MoE (ScMoE) technology, integrating efficient multimodal perception modules and speech reconstruction modules. Despite the total parameters of 560 billion (27 billion activated parameters), it can still provide low-latency real-time audio and video interaction capabilities. This breakthrough provides developers with more efficient multimodal application scenarios solutions.

According to comprehensive evaluation results, LongCat-Flash-Omni performs excellently in full-modal benchmark tests, reaching the state-of-the-art (SOTA) level of open-source models. The model demonstrates strong competitiveness in key single-modal tasks such as text, image, video understanding, and speech perception and generation, achieving the goal of "no intelligence reduction across all modalities."

LongCat-Flash-Omni adopts an integrated full-modal architecture, combining offline multimodal understanding and real-time audio-video interaction capabilities. Its design philosophy is fully end-to-end, using visual and audio encoders as multimodal sensors, which can directly generate text and speech tokens, and reconstruct natural speech waveforms through a lightweight audio decoder, ensuring low-latency real-time interaction.

In addition, the model introduces a progressive early multimodal fusion training strategy to address the heterogeneity of different modal data distributions in full-modal model training. This strategy ensures effective collaboration between modalities and promotes overall model performance improvement.

In specific performance tests, LongCat-Flash-Omni has shown excellent performance in multiple fields, especially in text and image understanding tasks, where its capabilities not only did not decline but also achieved significant improvements. In audio and video processing, the model's performance is also outstanding, particularly in the naturalness and smoothness of real-time audio and video interaction, leading many open-source models.

The LongCat team also provides users with new experience channels, allowing users to experience image, file upload, and voice call functions through the official website. At the same time, the official LongCat App is now available, supporting online search and voice calls, and will launch video call features in the future.

Hugging Face:
https://huggingface.co/meituan-longcat/LongCat-Flash-Omni

Github:
https://github.com/meituan-longcat/LongCat-Flash-Omni

Meituan LongCat Large Model App Launches Officially! Voice Calls and Online Search Bring You Closer to AI

Meituan officially released its self-developed LongCat large model official app, available for download on Android and iOS systems. The app supports online search and voice call features, and video calls will be added in the future. Through text processing and multi-modal understanding technologies, it helps users efficiently access information, demonstrating Meituan's significant progress in the field of artificial intelligence.

Study Reveals the Outstanding Performance of Commercial Detection Tool Pangram in AI Text Detection

A study by the University of Chicago found significant differences in the performance of AI text detection tools. The research tested commercial AI text detection tools using 1,992 human-written texts (including reviews, news, novels, and other categories) and AI-generated texts from mainstream models such as GPT-4. The results showed notable differences in accuracy among different detection tools, and the study called for improved reliability of detection technology.

Ant Group's BaiLing Large Model Team Open Sources Ring-flash-linear-2.0-128K, Combining Hybrid Attention and MoE Architecture to Reshape Long-Text Programming Efficiency

Ant Group open-sources the BaiLing Large Model Ring-flash-linear-2.0-128K, specifically targeting long-text programming. It employs a hybrid linear attention mechanism with a sparse MoE architecture, achieving performance comparable to a 40B dense model by activating only 6.1B parameters. It achieves optimal results in code generation and intelligent agent applications, efficiently addressing the challenges of long context processing.

Meituan LongCat Team Launches VitaBench: A New Benchmark for Intelligent Agent Evaluation

The Meituan LongCat Team has launched the VitaBench intelligent agent evaluation benchmark, focusing on high-frequency life scenarios such as food delivery, restaurant dining, and travel. This benchmark constructs an interactive environment with 66 tools, covering complex operations from ticket purchasing to reservations, providing an important infrastructure for the development of intelligent agents in real-world scenarios.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

AI Brand Monitoring Tool

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Deployment Calculator

AI Dataset Collection

Intelligent Document Recognition

LongCat-Flash-Omni Officially Released, Opening a New Era of Multimodal Real-Time Interaction

AIbase基地

This article is from AIbase Daily

AI News Recommendations

AI Daily: Meituan's LongCat-Flash-Omni Released; Qwen3-Max Launches Deep Thinking Feature; Baidu Wenshi 5.0 Makes a Strong Return

Meituan LongCat Large Model App Launches Officially! Voice Calls and Online Search Bring You Closer to AI

Study Reveals the Outstanding Performance of Commercial Detection Tool Pangram in AI Text Detection

Ant Group's BaiLing Large Model Team Open Sources Ring-flash-linear-2.0-128K, Combining Hybrid Attention and MoE Architecture to Reshape Long-Text Programming Efficiency

OpenAI Acquires Apple Shortcut Original Team's New Startup

Meituan Launches LongCat-Video Video Generation Model with Native Support for 5-Minute Continuous Output

Meituan Launches LongCat-Video Video Generation Model, Opening a New Era of Long Video Creation

Study: Unlike Google Search, AI Search Often References Less Well-Known Websites

Meituan LongCat Team Launches VitaBench: A New Benchmark for Intelligent Agent Evaluation

Google NotebookLM Launches Anime-Style Video Feature: Nano Banana Can Instantly Generate Six Art Styles, Chinese Support Still Needs Optimization

GEO Services