Aliyun Open Sources WebAgent Project WebShaper GAIA Evaluation Exceeds Claude4-Sonnet

AIbase基地

Published inAI News · 9 min read · Jul 31, 2025

Recently, Alibaba Cloud's Tongyi Lab officially announced the open-source release of its self-developed search AI agent project WebAgent, with its flagship components WebShaper and WebSailor attracting widespread attention in the field of network agents. As a groundbreaking AI tool, WebAgent demonstrates an almost human-level or even superior network interaction capability through its end-to-end autonomous information retrieval and multi-step reasoning ability.

WebAgent: An Agent That Simulates Human Search Behavior

WebAgent is an open-source AI agent developed by Alibaba Cloud's Tongyi Lab, designed to simulate the perception, decision-making, and action cycle of humans in a network environment. Its core goal is to efficiently handle complex and ambiguous network tasks through autonomous search and multi-step reasoning. WebAgent includes multiple key components, among which WebSailor and WebShaper are highlights of technological innovation. According to official information, WebAgent can actively search academic databases, news websites, and professional forums, filter key information, and generate structured reports, making it widely applicable to scenarios such as academic research, business analysis, and daily queries.

On the authoritative evaluation set BrowseComp, the WebSailor-72B model performed particularly outstanding, surpassing closed-source models like DeepSeek R1 and Grok-3, ranking second only to OpenAI's DeepResearch, and topping the open-source network agent list. WebAgent also achieved excellent scores of 60.19 and 52.2 on the GAIA and WebWalkerQA benchmarks, demonstrating its outstanding performance in complex tasks.

WebShaper: A New Paradigm for Data Synthesis Driven by Formalization

WebShaper is a core innovation in the WebAgent ecosystem, proposing a data synthesis method based on "formalization-driven," solving the reasoning challenges faced by AI in high-uncertainty tasks. WebShaper constructs a mathematical representation framework for information search tasks using set theory, utilizing the concept of "knowledge projection" to abstract the complex search process into operations on entity sets. For example, when querying "players born in the 1990s who played for East German football teams in the 2004-05 season," WebShaper can systematically generate training data, ensuring accuracy in multi-step reasoning.

The WebShaper dataset covers multiple fields such as sports, academia, politics, and entertainment, with sports-related questions accounting for 21% and academic-related ones for 17%, ensuring broad knowledge adaptability. Its layered expansion strategy avoids reasoning shortcuts and information redundancy, forcing AI to derive answers through complete reasoning paths. In experiments, models trained with WebShaper outperformed traditional datasets like WebWalkerQA and E2HQA under the same data volume.

WebSailor: The "Super Network Detective" in Complex Tasks

As the "brain" of WebAgent, WebSailor is a large language model responsible for understanding user intent, formulating browsing strategies, and deciding on operational steps. Its latest version, WebSailor-72B, enables one-click deployment through Alibaba Cloud's FunctionAI, allowing users to complete configuration in just 10 minutes, significantly lowering the usage barrier. WebSailor excels in high-uncertainty tasks, such as handling ambiguous queries or complex scenarios requiring cross-platform information integration.

WebSailor's training utilized the innovative SailorFog-QA dataset, simulating complex knowledge graphs in real-world network environments through subgraph sampling and information fuzzification techniques. This approach grants the model the ability to handle "superhuman" tasks, such as in the BrowseComp test, where the WebSailor-32B and 72B versions not only led all open-source models but also surpassed some closed-source systems.

WebDancer and WebWalker: Building a Complete Ecosystem

The success of WebAgent relies on its two major modules: WebDancer and WebWalker. WebDancer is an end-to-end agent training framework that enhances AI's multi-step search capabilities through four-stage training (data construction, trajectory sampling, supervised fine-tuning, and reinforcement learning). Its latest version, WebDancer-QwQ-32B, achieved an excellent score of 64.1% in the GAIA Pass@3 evaluation. WebWalker, on the other hand, is a benchmark testing tool used to evaluate the performance of language models in complex web traversal, providing developers with a standardized evaluation system to optimize algorithms.

WebAgent's hybrid reasoning mode dynamically allocates computing resources through a "thought budget mechanism," balancing quick responses to simple queries and deep reasoning for complex tasks. In practical applications, WebAgent can complete the crawling and analysis of Tesla and XPeng car configuration tables within 10 minutes, or extract clinical trial data from databases like PubMed and generate traceable reports, far exceeding manual efficiency.

Open Source Significance: Reshaping Information Processing and Community Innovation

The open source of WebAgent not only reduces the cost for enterprises and developers but also provides the global AI community with an industrial-grade training framework and evaluation standards. Its GitHub repository (https://github.com/Alibaba-NLP/WebAgent) has received over 4,000 stars and is ranked first on GitHub trending, third on Huggingface monthly. The training strategy of WebSailor—high-difficulty task synthesis, small-scale cold start, and efficient reinforcement learning optimization—offers valuable insights for the open-source community to tackle complex reasoning tasks.

From academic research to business decisions, WebAgent has significant application potential. For example, researchers can use it to quickly retrieve topics of ACL2025 papers, business users can analyze AI chip market trends in 2025, and ordinary users can get personalized recommendations for travel planning or health consultations. The open source of WebAgent marks the transition of AI agents from technical demonstrations to productive scenarios, and it is expected to further drive breakthroughs in cross-modal information integration and open-domain reasoning in the future.

GitHub: https://github.com/Alibaba-NLP/WebAgent

Huggingface: https://huggingface.co/datasets/Alibaba-NLP/WebShaper

Model Scope: https://modelscope.cn/datasets/iic/WebShaper

AI Daily: Alibaba Cloud Launches Multimodal Interaction Development Kit; Zhipu AI Makes Debut on Hong Kong Stock Exchange; Huan Yuan Releases HY-Motion 1.0 Open-Source 3D Character Animation Generation Model

Welcome to the [AI Daily] section! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with the latest content in the AI field, focusing on developers to help you understand technical trends and innovative AI product applications. 6. Jia Yuelin officially announced that FF will enter the field of embodied intelligent robots, with FXSuperOne's mass production delivery starting in the second quarter. Faraday Future held its first shareholder day event during the 2026 CES, announcing it has entered a 'dual-track drive' growth phase and launched a strategy for embodied intelligent robots. FXSuperOne's mass production delivery plan

Chow Tai Fook Collaborates with Volcano Engine to Launch an AI Intelligent Assistant to Enhance Efficiency in Jewelry Retail

Chow Tai Fook Jewelry partners with Volcano Engine to launch the 'AI Afu Smart Agent Family', leveraging AI to enhance retail efficiency. Amid global expansion challenges, the company integrates AI across operations, deploying 351 agents since late 2024 to support marketing, finance, IT, HR, and sales.....

Stack Overflow launches Stack Internal: Turning Enterprise Q&A into an AI Trustworthy Knowledge Base

Stack Overflow has launched the enterprise product Stack Internal, which provides technical Q&A metadata and reliability scores through the MCP interface, helping AI agents avoid generating incorrect information. The CEO revealed that large customers have already paid to use it, with a business model similar to Reddit's content licensing.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Brand Visibility

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

Aliyun Open Sources WebAgent Project WebShaper GAIA Evaluation Exceeds Claude4-Sonnet

AIbase基地

WebAgent: An Agent That Simulates Human Search Behavior

WebShaper: A New Paradigm for Data Synthesis Driven by Formalization

WebSailor: The "Super Network Detective" in Complex Tasks

WebDancer and WebWalker: Building a Complete Ecosystem

Open Source Significance: Reshaping Information Processing and Community Innovation

This article is from AIbase Daily

AI News Recommendations

AI Can Finally Manipulate Things! Vercel Launches Agent Browser to Let Large Models Control Websites

AI Daily: Alibaba Cloud Launches Multimodal Interaction Development Kit; Zhipu AI Makes Debut on Hong Kong Stock Exchange; Huan Yuan Releases HY-Motion 1.0 Open-Source 3D Character Animation Generation Model

Microsoft Announces Native Support for the MCP Protocol in Windows 11

Meta Announces Acquisition of Manus, Transaction Worth Several Billion Dollars

Chow Tai Fook Collaborates with Volcano Engine to Launch an AI Intelligent Assistant to Enhance Efficiency in Jewelry Retail

Manus, Once Doubted by Countless People, Surpasses $100 Million ARR in 8 Months! Sets Global Fastest Record. The AI Agent Era Has Been Fully Triggered

Microsoft Advances Windows AI Assistant Program, Balancing Risks and Innovations

AI Customer Service Company Sierra ARR Exceeds $100 Million: 21-Month Milestone with 100x Valuation Growth, Charged by Completed Work Volume

Stack Overflow launches Stack Internal: Turning Enterprise Q&A into an AI Trustworthy Knowledge Base

AI Daily: Moon's Dark Side Opens New AI Framework Kosong; Baidu Releases New Model ERNIE-4.5-VL; GPT-5.1 Makes a Stealth Appearance

AI News Recommendations

AI Can Finally Manipulate Things! Vercel Launches Agent Browser to Let Large Models Control Websites

AI Daily: Alibaba Cloud Launches Multimodal Interaction Development Kit; Zhipu AI Makes Debut on Hong Kong Stock Exchange; Huan Yuan Releases HY-Motion 1.0 Open-Source 3D Character Animation Generation Model

Microsoft Announces Native Support for the MCP Protocol in Windows 11

Meta Announces Acquisition of Manus, Transaction Worth Several Billion Dollars

Chow Tai Fook Collaborates with Volcano Engine to Launch an AI Intelligent Assistant to Enhance Efficiency in Jewelry Retail

Manus, Once Doubted by Countless People, Surpasses $100 Million ARR in 8 Months! Sets Global Fastest Record. The AI Agent Era Has Been Fully Triggered

Microsoft Advances Windows AI Assistant Program, Balancing Risks and Innovations

AI Customer Service Company Sierra ARR Exceeds $100 Million: 21-Month Milestone with 100x Valuation Growth, Charged by Completed Work Volume

Stack Overflow launches Stack Internal: Turning Enterprise Q&A into an AI Trustworthy Knowledge Base

AI Daily: Moon's Dark Side Opens New AI Framework Kosong; Baidu Releases New Model ERNIE-4.5-VL; GPT-5.1 Makes a Stealth Appearance

GEO Services