New GoT-R1 Multimodal Model Released: Making AI Drawing Smarter, the New Era of Image Generation!

AIbase基地

Published inAI News · 4 min read · Jun 26, 2025

Recently, a research team from the University of Hong Kong, The Chinese University of Hong Kong, and SenseTime has released a remarkable new framework - GoT-R1. This innovative multimodal large model significantly enhances AI's semantic and spatial reasoning capabilities in visual generation tasks by introducing reinforcement learning (RL), successfully generating high-fidelity and semantically consistent images from complex text prompts. This advancement marks another leap forward in image generation technology.

Currently, although existing multimodal large models have made significant progress in generating images based on text prompts, they still face many challenges when handling instructions involving precise spatial relationships and complex combinations. GoT-R1 was created to address this issue. Compared to its predecessor GoT, GoT-R1 not only expands AI's reasoning capabilities but also enables it to autonomously learn and optimize reasoning strategies.

The core of GoT-R1 lies in its reinforcement learning mechanism. The team designed a comprehensive and effective reward mechanism to help the model better understand complex user instructions during image generation. This mechanism covers multiple evaluation dimensions, including semantic consistency, accuracy of spatial layout, and overall aesthetic quality of the generated image. More importantly, GoT-R1 also visualizes the reasoning process, allowing the model to more accurately assess the effectiveness of image generation.

After comprehensive evaluation, the research team found that GoT-R1 performed exceptionally well in a benchmark test called T2I-CompBench, especially in handling complex multi-level instructions, demonstrating capabilities surpassing other mainstream models. For example, in the "complex" benchmark test, GoT-R1 showed outstanding performance, with its strong reasoning and generation capabilities enabling the model to achieve the highest scores in multiple evaluation categories.

The release of GoT-R1 has injected new vitality into multimodal image generation technology, showcasing the infinite possibilities of AI in handling complex tasks. With the continuous development of technology, future image generation will become more intelligent and precise.

Paper: https://arxiv.org/pdf/2503.10639

New Oriental Launches Its First Original AI Education Product - New Oriental AI 1-on-1, Revolutionizing the Traditional Learning Model

New Oriental officially launched its first consumer-facing original AI education product - New Oriental AI 1-on-1 today. This is not only a major breakthrough in teaching methods, but also marks a critical step in New Oriental's strategic layout of "Education + AI." The core competitiveness of New Oriental AI 1-on-1 lies in providing learners with a high-frequency interactive 1-on-1 learning experience. The AI teacher can realistically reproduce the learning environment, achieving real interaction and real Q&A. At the same time, the AI teacher is patient, responsible, proficient in teaching, and can provide timely feedback, as well as praise and encourage students.

Ant Group Accelerates the Promotion of AI Healthcare and Launches a New Large Model Application "AQ"

Ant Group officially launched the AI health application "AQ" on June 26, offering more than 100 AI functions such as health education, medical consultation, and report interpretation. It connects over 5,000 hospitals, nearly one million doctors, and nearly 200 AI avatars of famous doctors nationwide. The application is now available on major app stores. Han Xinyi, CEO of Ant Group, stated: Ant hopes that through AQ, everyone will have a trusted health assistant, becoming a helpful assistant for universal health and an accessible medical aid. Intelligent questioning and multimodal recognition address medical challenges. Facing nearly 75% of Chinese

Wang Xingxing, Founder of Unitree Robotics: From a Solo Entrepreneur to a Robot Giant with 1 Billion Yuan in Annual Revenue

On June 26th, during the Tianjin Summer Davos Forum today, Wang Xingxing, CEO of Unitree Robotics, shared the company's impressive growth journey with attendees. According to AIbase, Wang revealed that Unitree Robotics, which started as a one-person company since its establishment in 2016, has now grown into an industry giant with nearly a thousand employees and annual revenue exceeding 1 billion yuan. Wang's speech highlighted the remarkable progress made by Unitree Robotics in just nine years, showcasing its strong capabilities and market influence in the robotics field.

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

New GoT-R1 Multimodal Model Released: Making AI Drawing Smarter, the New Era of Image Generation!

AIbase基地

This article is from AIbase Daily

AI News Recommendations

New Oriental Launches Its First Original AI Education Product - New Oriental AI 1-on-1, Revolutionizing the Traditional Learning Model

Ant Group Accelerates the Promotion of AI Healthcare and Launches a New Large Model Application "AQ"

Google DeepMind Releases AlphaGenome: An AI Model Revolutionizing DNA Sequence Analysis

Alibaba Collaborates with Zhejiang Cancer Hospital to Launch the World's First AI Model for Early Gastric Cancer Imaging Screening

Anthropic Spends Millions of Dollars to Purchase a Large Number of Books to Train Claude and Finally Destroys Them

Wang Xingxing, Founder of Unitree Robotics: From a Solo Entrepreneur to a Robot Giant with 1 Billion Yuan in Annual Revenue

Gemini Robotics, the First Locally Run Robot Intelligence Model, is Launched, Marking a New Era of Embodied Intelligence

4D-LRM Launches with a Shocking Impact! AI Reconstructs Time and Space, Instantly Restore Any Perspective and Any Moment

Google AI's Amazing New Feature Revealed! How Smooth Is the Next-Generation Customer Service Assistant?

Bytedance Launches ProtoReasoning Framework: Enhancing the Logical Reasoning Ability of Large Language Models