Meituan Launches Native Multimodal LongCat-Next: Visual and Speech Achieve Bottom-Level Unification

AIbase基地

Published inAI News · 4 min read · Apr 3, 2026

On April 3, the MiTi team officially released the native multimodal large model LongCat-Next. This model breaks through the traditional "language foundation + plugin" architecture by converting images, speech, and text into the same source discrete Token, allowing AI to "see" and "hear" the physical world natively, just like processing text.

Technical Core: DiNA Architecture Achieves "Modality Internalization"

To break down the barriers between modalities, MiTi has built the DiNA (Discrete Native Autoregressive) architecture, achieving deep unification in multimodal modeling:

Full Modality Unification: Whether it's text, images, or audio, the model uses the same set of parameters, attention mechanisms, and loss functions.
Symmetry of Understanding and Generation: Under a unified mathematical form, predicting text Tokens is "understanding," while predicting image Tokens is "generation." Both show significant collaborative potential during training.
Extreme Compression: Using the dNaViT Visual Tokenizer, it supports arbitrary resolution inputs and achieves a pixel space compression of up to 28 times through 8 layers of residual vector quantization, preserving key details in tasks such as OCR and financial report parsing.

Empirical Performance: Discrete Modeling Has No "Ceiling"

LongCat-Next demonstrates performance surpassing specialized models across multiple dimensions, effectively refuting the traditional view that "discretization inevitably leads to information loss":

Fine-Grained Perception: In dense text scenarios on OmniDocBench, its performance not only exceeds Qwen3-Omni but also outperforms the specialized visual model Qwen3-VL.
Visual Reasoning: It achieved an impressive score of 83.1 on MathVista, demonstrating strong industrial-level logical capabilities.
Cross-Modal Collaboration: While maintaining leading language capabilities (C-Eval 86.80), it supports low-latency parallel text and speech generation and customizable voice cloning.

Industry Insight: The Foundation for Physical World AI

For a long time, large models have been language-centered systems. The significance of LongCat-Next lies in proving that physical information can be discretized and modeled like language. When AI has a unified "native language," it becomes smarter and more intuitive when calling tools, writing code, and understanding complex charts.

Currently, MiTi has open-sourced the LongCat-Next model and the dNaViT tokenizer. This compact, high-potential native discrete architecture will provide important tools for developers to build AI capable of perceiving and acting upon the real world.

LongCat-Next DiNA MultimodalLargeModel NativeMultimodal

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

AI Daily: ByteDance Open-Sources Unified Multimodal Large Model Lance 3B; Zhipei Launches GLM-5.1 High-Speed Version; CapCut Collaborates with Gemini for Deep Integration

Welcome to the [AI Daily] segment! Here is your guide to exploring the world of artificial intelligence every day. Every day, we present you with the latest content in the AI field, focusing on developers to help you understand technology trends and innovative AI product applications. Click to learn more about new AI products: https://app.aibase.com/zh1. ByteDance Open-Sources Lance3B: 'One Brain' That Handles Image and Text Understanding and Generation Simultaneously ByteDance has open-sourced its native unified multimodal large model Lance, achieving full functionality with 3B parameters.

May 22, 2026

400

Google Launches Next-Generation AI Search Ads: Gemini Large Model Redefines Marketing Experience

Google introduced a new generation of search ad formats based on the Gemini large model at its annual marketing conference, deeply integrating AI into the advertising experience to help brands achieve precise scenario connections during user research and decision-making. As users make faster purchasing decisions through AI modes, traditional search engine traffic entrances are being restructured. The new ad format provides highly personalized content, enhancing ad effectiveness.

May 22, 2026

210

No Rehearsal, Real Fight on Stage! Meituan LongCat-Video-Avatar1.5 Open-Sourced: Fully Outperforming Mainstream Closed-Source Models

The Meituan Dragon Cat large model team has open-sourced the commercial-grade digital human video generation model LongCat-Video-Avatar1.5, achieving a leap from open-source SOTA to commercial application. This version significantly improves in core dimensions such as lip synchronization, physical plausibility, long video stability, multi-person interaction, and efficient inference, aiming to solve the pain points of traditional digital human videos and promote the application of digital humans toward realistic scenarios tailored for individuals.

May 22, 2026

260

Co-founder of OpenAI, Andrej Karpathy, announces joining Anthropic to focus on next-generation LLM development

OpenAI co-founder and former Tesla Autopilot lead Andrej Karpathy joins Anthropic, intensifying AI talent flow to OpenAI's main rival. Following multiple key departures from OpenAI, Karpathy will focus on underlying architecture and deep integration of large language models.....

May 20, 2026

180

Meituan Reveals Core AI Plans for the Next Three Years to Promote Business Intelligence

Since 2023, Meituan has invested in AI, and by 2026, it will focus on three core areas: integrating AI tools into existing operations, such as an intelligent dispatch system for food delivery that optimizes rider routes, reduces wasted mileage, and boosts efficiency, showcasing its foresight in local life services.....

May 13, 2026

270

Codex Ambitions Revealed: Three-Column Layout Becomes Standard for AI Agents, Plugin Ecosystem May Become the Next Competitive Battlefield

Recently, mainstream AI Agent products like Codex, Claude Desktop, and Cursor 3.0 have adopted a three-panel layout design, which is not coincidental imitation but an inevitable trend in interaction optimization. Traditional chatbot two-panel layouts suit simple Q&A, but in the Agent era, AI can autonomously code, modify files, and use tools, requiring users to intuitively review operations, hence the right workspace. Additionally, longer usage t....

May 12, 2026

200

Kuaishou Plans to Split Kual AI for Independent Funding, Valuation Reaches $20 Billion on Track for Next Year's IPO

Kuaishou plans to spin off its AI video generation business, Kling, into an independent entity before its 2026 IPO to capitalize on the global AI boom. Kling, which launched in 2024 and generates HD videos from text and images, is now in Pre-IPO funding talks with a valuation of around $20 billion, serving as a core product in Kuaishou's AGI strategy.....

May 12, 2026

580

Arm expects AI chip sales to reach $2 billion next year

British semiconductor company Arm announced that its first self-developed AI chip is expected to generate $2 billion in sales next year, as market demand is strong. This move marks a significant breakthrough for Arm in the AI field. As a subsidiary of SoftBank, Arm holds a key position in the global semiconductor market, with its technology widely used in electronic devices. The launch of the new chip comes at a time when AI technology is rapidly developing and the demand for high-performance computing is increasing.

May 7, 2026

260

OpenAI's Hardware Ambitions Accelerate: Smart Agent Phone to Be Mass-Produced by Next Year, Targeting 30 Million Units in Shipments

OpenAI's first AI agent smartphone mass production is advanced to the first half of 2027, significantly ahead of schedule, to boost IPO prospects amid intensified AI hardware competition. The device, designed for local AI computing, has selected its underlying architecture, accelerating development.....

May 6, 2026

340

EU Fails to Reach Agreement on AI Regulations, Negotiations to Resume Next Month

After 12 hours of negotiations, the EU and the European Parliament failed to agree on AI regulations aimed at strict oversight to address social and economic risks. Despite consensus on its importance, disagreements stalled the deal, leaving the planned August 2024 enactment uncertain.....

Apr 29, 2026

260

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Ranking Monitor

AI Conversation Insight

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Ranking Optimization

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator

Meituan Launches Native Multimodal LongCat-Next: Visual and Speech Achieve Bottom-Level Unification

AIbase基地

Technical Core: DiNA Architecture Achieves "Modality Internalization"

Empirical Performance: Discrete Modeling Has No "Ceiling"

Industry Insight: The Foundation for Physical World AI

This article is from AIbase Daily

AI News Recommendations

AI Daily: ByteDance Open-Sources Unified Multimodal Large Model Lance 3B; Zhipei Launches GLM-5.1 High-Speed Version; CapCut Collaborates with Gemini for Deep Integration

Google Launches Next-Generation AI Search Ads: Gemini Large Model Redefines Marketing Experience

No Rehearsal, Real Fight on Stage! Meituan LongCat-Video-Avatar1.5 Open-Sourced: Fully Outperforming Mainstream Closed-Source Models

Co-founder of OpenAI, Andrej Karpathy, announces joining Anthropic to focus on next-generation LLM development

Meituan Reveals Core AI Plans for the Next Three Years to Promote Business Intelligence

Codex Ambitions Revealed: Three-Column Layout Becomes Standard for AI Agents, Plugin Ecosystem May Become the Next Competitive Battlefield

Kuaishou Plans to Split Kual AI for Independent Funding, Valuation Reaches $20 Billion on Track for Next Year's IPO

Arm expects AI chip sales to reach $2 billion next year

OpenAI's Hardware Ambitions Accelerate: Smart Agent Phone to Be Mass-Produced by Next Year, Targeting 30 Million Units in Shipments

EU Fails to Reach Agreement on AI Regulations, Negotiations to Resume Next Month

AI News Recommendations

AI Daily: ByteDance Open-Sources Unified Multimodal Large Model Lance 3B; Zhipei Launches GLM-5.1 High-Speed Version; CapCut Collaborates with Gemini for Deep Integration

Google Launches Next-Generation AI Search Ads: Gemini Large Model Redefines Marketing Experience

No Rehearsal, Real Fight on Stage! Meituan LongCat-Video-Avatar1.5 Open-Sourced: Fully Outperforming Mainstream Closed-Source Models

Co-founder of OpenAI, Andrej Karpathy, announces joining Anthropic to focus on next-generation LLM development

Meituan Reveals Core AI Plans for the Next Three Years to Promote Business Intelligence

Codex Ambitions Revealed: Three-Column Layout Becomes Standard for AI Agents, Plugin Ecosystem May Become the Next Competitive Battlefield

Kuaishou Plans to Split Kual AI for Independent Funding, Valuation Reaches $20 Billion on Track for Next Year's IPO

Arm expects AI chip sales to reach $2 billion next year

OpenAI's Hardware Ambitions Accelerate: Smart Agent Phone to Be Mass-Produced by Next Year, Targeting 30 Million Units in Shipments

EU Fails to Reach Agreement on AI Regulations, Negotiations to Resume Next Month