Recently, Alibaba Cloud's Tongyi Lab officially announced the open-source release of its self-developed search AI agent project WebAgent, with its flagship components WebShaper and WebSailor attracting widespread attention in the field of network agents. As a groundbreaking AI tool, WebAgent demonstrates an almost human-level or even superior network interaction capability through its end-to-end autonomous information retrieval and multi-step reasoning ability.
WebAgent: An Agent That Simulates Human Search Behavior
WebAgent is an open-source AI agent developed by Alibaba Cloud's Tongyi Lab, designed to simulate the perception, decision-making, and action cycle of humans in a network environment. Its core goal is to efficiently handle complex and ambiguous network tasks through autonomous search and multi-step reasoning. WebAgent includes multiple key components, among which WebSailor and WebShaper are highlights of technological innovation. According to official information, WebAgent can actively search academic databases, news websites, and professional forums, filter key information, and generate structured reports, making it widely applicable to scenarios such as academic research, business analysis, and daily queries.
On the authoritative evaluation set BrowseComp, the WebSailor-72B model performed particularly outstanding, surpassing closed-source models like DeepSeek R1 and Grok-3, ranking second only to OpenAI's DeepResearch, and topping the open-source network agent list. WebAgent also achieved excellent scores of 60.19 and 52.2 on the GAIA and WebWalkerQA benchmarks, demonstrating its outstanding performance in complex tasks.
WebShaper: A New Paradigm for Data Synthesis Driven by Formalization
WebShaper is a core innovation in the WebAgent ecosystem, proposing a data synthesis method based on "formalization-driven," solving the reasoning challenges faced by AI in high-uncertainty tasks. WebShaper constructs a mathematical representation framework for information search tasks using set theory, utilizing the concept of "knowledge projection" to abstract the complex search process into operations on entity sets. For example, when querying "players born in the 1990s who played for East German football teams in the 2004-05 season," WebShaper can systematically generate training data, ensuring accuracy in multi-step reasoning.
The WebShaper dataset covers multiple fields such as sports, academia, politics, and entertainment, with sports-related questions accounting for 21% and academic-related ones for 17%, ensuring broad knowledge adaptability. Its layered expansion strategy avoids reasoning shortcuts and information redundancy, forcing AI to derive answers through complete reasoning paths. In experiments, models trained with WebShaper outperformed traditional datasets like WebWalkerQA and E2HQA under the same data volume.
WebSailor: The "Super Network Detective" in Complex Tasks
As the "brain" of WebAgent, WebSailor is a large language model responsible for understanding user intent, formulating browsing strategies, and deciding on operational steps. Its latest version, WebSailor-72B, enables one-click deployment through Alibaba Cloud's FunctionAI, allowing users to complete configuration in just 10 minutes, significantly lowering the usage barrier. WebSailor excels in high-uncertainty tasks, such as handling ambiguous queries or complex scenarios requiring cross-platform information integration.
WebSailor's training utilized the innovative SailorFog-QA dataset, simulating complex knowledge graphs in real-world network environments through subgraph sampling and information fuzzification techniques. This approach grants the model the ability to handle "superhuman" tasks, such as in the BrowseComp test, where the WebSailor-32B and 72B versions not only led all open-source models but also surpassed some closed-source systems.
WebDancer and WebWalker: Building a Complete Ecosystem
The success of WebAgent relies on its two major modules: WebDancer and WebWalker. WebDancer is an end-to-end agent training framework that enhances AI's multi-step search capabilities through four-stage training (data construction, trajectory sampling, supervised fine-tuning, and reinforcement learning). Its latest version, WebDancer-QwQ-32B, achieved an excellent score of 64.1% in the GAIA Pass@3 evaluation. WebWalker, on the other hand, is a benchmark testing tool used to evaluate the performance of language models in complex web traversal, providing developers with a standardized evaluation system to optimize algorithms.
WebAgent's hybrid reasoning mode dynamically allocates computing resources through a "thought budget mechanism," balancing quick responses to simple queries and deep reasoning for complex tasks. In practical applications, WebAgent can complete the crawling and analysis of Tesla and XPeng car configuration tables within 10 minutes, or extract clinical trial data from databases like PubMed and generate traceable reports, far exceeding manual efficiency.
Open Source Significance: Reshaping Information Processing and Community Innovation
The open source of WebAgent not only reduces the cost for enterprises and developers but also provides the global AI community with an industrial-grade training framework and evaluation standards. Its GitHub repository (https://github.com/Alibaba-NLP/WebAgent) has received over 4,000 stars and is ranked first on GitHub trending, third on Huggingface monthly. The training strategy of WebSailor—high-difficulty task synthesis, small-scale cold start, and efficient reinforcement learning optimization—offers valuable insights for the open-source community to tackle complex reasoning tasks.
From academic research to business decisions, WebAgent has significant application potential. For example, researchers can use it to quickly retrieve topics of ACL2025 papers, business users can analyze AI chip market trends in 2025, and ordinary users can get personalized recommendations for travel planning or health consultations. The open source of WebAgent marks the transition of AI agents from technical demonstrations to productive scenarios, and it is expected to further drive breakthroughs in cross-modal information integration and open-domain reasoning in the future.
GitHub: https://github.com/Alibaba-NLP/WebAgent
Huggingface: https://huggingface.co/datasets/Alibaba-NLP/WebShaper
Model Scope: https://modelscope.cn/datasets/iic/WebShaper