Today, the X-PLUG team officially released its latest project, Mobile-Agent-v3, on GitHub. This is a cross-platform multi-agent framework based on GUI-Owl. Mobile-Agent-v3 has strong planning, progress management, reflection, and memory capabilities, aiming to enhance users' GUI automation experience.

GUI-Owl, as the base model of Mobile-Agent-v3, integrates functions such as perception, foundation, reasoning, planning, and execution. It is a native end-to-end multimodal agent. Its design makes cross-platform interaction and multi-turn decision-making more fluid, with clear intermediate reasoning capabilities. This means that users can achieve more stable performance when performing multi-task operations.

QQ20250825-112403.png

The X-PLUG team specifically mentioned that Mobile-Agent-v3 not only optimizes functionality but also enhances exception handling and reflection capabilities, ensuring efficient operation even when facing pop-ups and ads. In addition, the key information recording feature of Mobile-Agent-v3 makes executing cross-app tasks more convenient, greatly facilitating users' daily operations.

Meanwhile, multiple previous versions of Mobile-Agent, such as Mobile-Agent-v2 and PC-Agent, were accepted at the NeurIPS2024 and ICLR2025 conferences, demonstrating the wide influence of this project in the field of academic research.

Notably, the X-PLUG team also provides rich resource support, including technical reports, demonstration videos, and code repositories, enabling developers and researchers to explore the potential of Mobile-Agent more deeply. Through these resources, users can not only experience the powerful features of Mobile-Agent but also participate in its subsequent development and optimization.

Address: https://github.com/X-PLUG/MobileAgent