In the field of computer use agents, there has been an exciting breakthrough recently. The research team from Shanghai Jiao Tong University and SII successfully trained a new-generation open-source computer use agent named PC Agent-E using only 312 human-labeled operation trajectories. Its performance improved by as much as 241%, surpassing the famous Claude3.7Sonnet, and becoming the latest optimal model on the Windows system.
Since Anthropic released Claude Computer Use, the development of computer use agents has been highly anticipated. OpenAI also subsequently released Operator, enhancing the capabilities of computer use agents through reinforcement learning technology. However, the industry generally believes that reaching such a level requires a large amount of trajectory data and complex reinforcement learning algorithms. In response to this view, the team from Shanghai Jiao Tong University and SII refuted it with practical results: just a small amount of high-quality data can unleash the potential of agents.
The key to this study lies in how to effectively utilize human operation trajectories. The research team collected 312 real operation trajectories in just one day through the developed tool PC Tracker. These trajectories include task descriptions, screen captures, and detailed records of keyboard and mouse operations, ensuring data accuracy. Afterward, the research team also performed "chain-of-thought completion" for these trajectories, providing the thought process behind each action, making the data more complete.
To further enhance the model's performance, the team introduced "trajectory enhancement" technology. By using Claude3.7Sonnet, researchers synthesized multiple reasonable action decisions for each step of the operation, increasing not only the diversity of the trajectory data but also significantly improving training efficiency. Ultimately, PC Agent-E performed excellently in the WindowsAgentArena-V2 test, surpassing the "extended thinking" mode of Claude3.7Sonnet.
The results of this study show that powerful agent training can be achieved using a small amount of high-quality data without requiring massive annotated data. This points the way for the future development of smarter digital agents. The team also believes that by improving the quality of trajectory data, the demand for data can be effectively reduced, promoting the autonomy of agents.
Paper link: https://arxiv.org/abs/2505.13909
Code link: https://github.com/GAIR-NLP/PC-Agent-E
Model link: https://huggingface.co/henryhe0123/PC-Agent-E
Data link: https://huggingface.co/datasets/henryhe0123/PC-Agent-E