Alibaba Qwen Team Releases the Next-Generation GUI Automation Framework Mobile-Agent-v3 and GUI-Owl

AIbase基地

Published inAI News · 4 min read · Sep 1, 2025

115

Recently, the Qwen team at Alibaba introduced two revolutionary products - Mobile-Agent-v3 and GUI-Owl, which aim to address a series of challenges in graphical user interface (GUI) automation.

Modern computing devices widely use graphical user interfaces, yet previous automation methods often relied on complex scripts and manual rules, with less than ideal results. GUI-Owl, a new multimodal agent model, is built upon Qwen2.5-VL and further trained on a large amount of GUI interaction data, aiming to enhance task understanding and execution capabilities.

The design of GUI-Owl aims to handle the diversity and dynamics of GUI environments in the real world. By integrating perception, reasoning, planning, and execution capabilities, it provides a unified policy network. This design enables it to make multi-turn decisions in complex tasks while maintaining clear reasoning processes and adapting to changes in practical applications.

To ensure high-quality data support, the team developed a self-evolving data production pipeline. This pipeline generates realistic application navigation workflows and validates them through human annotations, ensuring the authenticity and effectiveness of the generated data. In addition, the team used various data synthesis strategies to enrich the model's learning content, enabling stronger adaptability and flexibility during task execution.

The Mobile-Agent-v3 framework focuses on multi-agent collaboration, breaking down complex tasks into sub-goals and dynamically updating plans to handle execution feedback. Four specialized agents within the framework - the manager agent, the worker agent, the reflection agent, and the note agent - each have their own roles, improving the efficiency and success rate of task execution. After multiple rounds of testing and evaluation, GUI-Owl and Mobile-Agent-v3 have shown excellent performance on multiple GUI automation benchmarks, especially in cross-platform task completion capabilities.

These innovative tools mark a significant advancement for Alibaba in the field of general GUI automation, and will provide stronger technical support for more extensive application scenarios in the future.

Paper: https://arxiv.org/abs/2508.15144

github: https://github.com/X-PLUG/MobileAgent

Key Points:
🌟 GUI-Owl is a multimodal agent model launched by Alibaba, integrating perception, reasoning, and execution capabilities to adapt to complex GUI environments.
🤖 The Mobile-Agent-v3 framework achieves multi-agent collaboration, enhancing task execution efficiency through dynamic plan updates.
📊 These two products have shown outstanding performance in GUI automation benchmark tests, marking an important breakthrough for Alibaba in the field of automation.

5 Billion Qwen Helped Me! Alibaba Qwen Spring Festival Event Over 130 Million People Participated in AI Life Services

During Alibaba's Qwen App Spring Festival event, over 130 million users utilized AI assistants for services like ordering milk tea and stocking up on New Year goods, with 'Qwen Help Me' used 5 billion times, integrating AI deeply into holiday consumption. Post-launch, AI-driven movie ticket purchases saw significant growth.....

Yushu Robotics Demonstrates Global First Stunt at the Spring Festival Gala, Vault Height Exceeds 3 Meters

At the 2026 Spring Festival Gala, Yushu Humanoid Robots performed the martial arts routine "Dance BOT" with children, breaking multiple motion limits: vault height exceeds 3 meters, performing continuous one-foot flips, reaching a maximum running speed of 4m/s, and completing high-difficulty actions such as somersaults and wielding sticks and swords, demonstrating excellent stability and flexibility.

Qwen3.5-Plus Open-Sourced on the Eve of Chinese New Year, Ranking as the World's Strongest Open-Source Large Model

On the eve of Chinese New Year in 2026, Alibaba opened-source the new generation large model Qwen3.5-Plus, whose performance rivals that of Gemini3Pro, becoming the world's strongest open-source large model. The model adopts a revolutionary underlying architecture, with 397 billion parameters but only 17 billion activated, surpassing the Qwen3-Max with trillions of parameters at a smaller scale. The deployment memory usage is reduced by 60%, and the long context reasoning throughput is increased by 19 times. The API cost is as low as 0.8 yuan per million Tokens, just 1/18th of Gemini3Pro.

MiniMax M2.5-HighSpeed: 3 Times Faster Inference Speed, Empowering AI Applications

After the release of the MiniMax M2.5 model, it was quickly integrated into over 50 platforms, and the M2.5-highspeed model was launched, with an inference speed of 100 TPS, three times that of similar products. At the same time, three types of Coding Plan packages were released, and users can enjoy a 90% discount by inviting friends, continuously improving AI service efficiency.

MiniMax M2.5 Open Source, The Era of Low-Cost Agent Is Here

MiniMax launches the M2.5 model, the third upgrade in its M2 series within 108 days. Open-sourced on ModelScope, it excels in programming, search, and office tasks, achieving breakthroughs in capability, efficiency, and cost. It offers full solutions from no-code use to private deployment, with guides for tool usage and parameter tuning, aiming to advance low-cost Agent technology. M2.5 performs notably in benchmarks like SWE-Bench Verified.....

Only 7 People Can Beat It! New Gemini 3 Deep Think Released: Dominating Programming and Research Rankings

Google's Gemini 3 Deep Think model has been significantly upgraded, excelling in programming, research, and engineering. Its key highlight is achieving a high score of 3455 on Codeforces, surpassing most human players, with only 7 globally able to beat it, marking a new stage in AI reasoning capabilities.....

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Brand Visibility

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

Alibaba Qwen Team Releases the Next-Generation GUI Automation Framework Mobile-Agent-v3 and GUI-Owl

AIbase基地

This article is from AIbase Daily

AI News Recommendations

5 Billion Qwen Helped Me! Alibaba Qwen Spring Festival Event Over 130 Million People Participated in AI Life Services

Yushu Robotics Demonstrates Global First Stunt at the Spring Festival Gala, Vault Height Exceeds 3 Meters

Qwen3.5-Plus Open-Sourced on the Eve of Chinese New Year, Ranking as the World's Strongest Open-Source Large Model

MiniMax M2.5-HighSpeed: 3 Times Faster Inference Speed, Empowering AI Applications

Qwen3.5 Makes Its Debut on New Year's Eve, Alibaba Fully Innovates Its Artificial Intelligence Architecture

OpenClaw Founder Joins OpenAI, Marking a New Chapter in AI Agent Technology

MiniMax M2.5 Open Source, The Era of Low-Cost Agent Is Here

DeepSeek's Style Change Trends on the Hot Search List, Programming Ability in V4 May Become the New Ace

Only 7 People Can Beat It! New Gemini 3 Deep Think Released: Dominating Programming and Research Rankings

Olympic Math Gold Medal-Level Reasoning! Google Releases the New Gemini 3 Deep Think: Born for Scientific Research, Performance Approaching the Final Exam of Humanity

AI News Recommendations

5 Billion Qwen Helped Me! Alibaba Qwen Spring Festival Event Over 130 Million People Participated in AI Life Services

Yushu Robotics Demonstrates Global First Stunt at the Spring Festival Gala, Vault Height Exceeds 3 Meters

Qwen3.5-Plus Open-Sourced on the Eve of Chinese New Year, Ranking as the World's Strongest Open-Source Large Model

MiniMax M2.5-HighSpeed: 3 Times Faster Inference Speed, Empowering AI Applications

Qwen3.5 Makes Its Debut on New Year's Eve, Alibaba Fully Innovates Its Artificial Intelligence Architecture

OpenClaw Founder Joins OpenAI, Marking a New Chapter in AI Agent Technology

MiniMax M2.5 Open Source, The Era of Low-Cost Agent Is Here

DeepSeek's Style Change Trends on the Hot Search List, Programming Ability in V4 May Become the New Ace

Only 7 People Can Beat It! New Gemini 3 Deep Think Released: Dominating Programming and Research Rankings

Olympic Math Gold Medal-Level Reasoning! Google Releases the New Gemini 3 Deep Think: Born for Scientific Research, Performance Approaching the Final Exam of Humanity

GEO Services