Breaking Traditions! FUDOKI Model Makes Multi-Modal Generation and Understanding More Flexible and Efficient

AIbase基地

Published inAI News · 4 min read · Jun 10, 2025

37

In recent years, the field of artificial intelligence has undergone tremendous changes, particularly with large language models (LLMs) making significant progress in multi-modal tasks. These models demonstrate powerful potential in understanding and generating language, but most current multi-modal models still adopt autoregressive (AR) architectures, which limit their inference process to be relatively monotonous and lacking in flexibility. To address this limitation, a research team from The University of Hong Kong and Huawei Noah’s Ark Lab has proposed a novel model called FUDOKI.

The core innovation of FUDOKI lies in its entirely new non-masked discrete flow matching architecture. Unlike traditional autoregressive models, FUDOKI achieves bidirectional information integration through parallel denoising mechanisms, significantly enhancing its performance in complex reasoning and generation tasks. This model not only bridges the gap between image generation and text understanding but also achieves unified modeling for both domains.

Brain Large Model AI

Figure source note: Image generated by AI, provided by Midjourney

This model's advantage is its mask-free design, making the generation process more flexible. During inference, FUDOKI allows dynamic adjustment of the generation results, as if it had learned human-like thinking patterns. Moreover, FUDOKI performs exceptionally well in image generation, achieving a score of 0.76 on the GenEval benchmark, surpassing same-sized autoregressive models and demonstrating high-quality generation effects and semantic accuracy.

The construction of FUDOKI relies on metric-induced probabilistic paths and optimal kinetic velocity. These technologies enable the model to consider the semantic similarity of each token during the generation process, resulting in more natural text and image generation. Additionally, during training, FUDOKI uses pre-trained autoregressive models for initialization, reducing training costs and improving efficiency.

The introduction of FUDOKI not only provides a new perspective for multi-modal generation and understanding but also lays a more solid foundation for the development of general artificial intelligence. In the future, we look forward to FUDOKI bringing further exploration and breakthroughs, driving the continued advancement of AI technology.

Large Language Models FUDOKI Maskless Discrete Flow Matching Huawei Noah's Ark Lab

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Breaking Traditions! FUDOKI Model Makes Multi-Modal Generation and Understanding More Flexible and Efficient

AIbase基地

This article is from AIbase Daily