Ali Tongyi Lab recently launched the fourth open-source tool in the WebAgent series—WebShaper. This groundbreaking framework has sparked industry discussions with its innovative "formal-driven" information retrieval paradigm. According to AIbase, which gathered information from social media and other channels, WebShaper not only achieved a high score of 60.19 on the GAIA benchmark test, surpassing Claude 3.5 Sonnet and GPT-4o, but also significantly improved AI's information retrieval and reasoning capabilities in complex tasks through a new data generation method.

 From Information-Driven to Formal-Driven: A Breakthrough in Paradigm

Traditional information retrieval (IS) methods are often centered around "information-driven" approaches, but they frequently face issues such as misalignment between information structure and reasoning logic, and limited knowledge coverage, leading to inadequate performance when AI handles open-ended complex tasks. WebShaper introduces a new paradigm called "formal-driven," redefining the data generation and model training process through systematic task formalization methods.

image.png

The core of this framework lies in ensuring that the knowledge structure and reasoning structure of training data are semantically highly consistent through a logically clear structured generation approach. AIbase learned that WebShaper uses an "Agentic Expander" to iteratively generate and verify questions, ensuring the data generation process is controllable and well-organized. This method not only improves data quality but also significantly enhances the model's performance in complex information retrieval tasks.

 GAIA Evaluation Achieves Excellent Results: 60.19 Points Leading Open-Source Models

WebShaper's performance is impressive. On the GAIA benchmark test, an open-source model trained on the WebShaper dataset achieved a high score of 60.19, surpassing the industry-leading Claude 3.5 Sonnet and GPT-4o, setting a new benchmark for open-source models. GAIA is a benchmark focused on evaluating AI's general capabilities, covering tasks such as multimodal processing, web browsing, and complex reasoning. Its high difficulty places strict requirements on AI's comprehensive abilities.

Additionally, WebShaper achieved an excellent score of 52.50 on the WebWalkerQA benchmark test, demonstrating its strong capabilities in web traversal and information retrieval tasks. AIbase believes that this achievement not only proves WebShaper's technological leadership but also injects new vitality into the open-source AI community.

image.png

 WebShaper Dataset: A New Training Paradigm Driven by Logic

One of the core innovations of WebShaper is its dataset generation framework. Unlike traditional chaotic data collection methods, WebShaper systematically generates information retrieval task instances through a formal-driven approach. AIbase learned that this framework can generate structured training data according to task requirements, ensuring semantic consistency between knowledge and reasoning logic, thus enabling AI to perform more accurately and efficiently when handling open-ended questions.

For example, WebShaper introduced the SailorFog-QA dataset, a high uncertainty and high difficulty question-answering benchmark designed to test model performance in complex scenarios. It is generated using graph sampling and information blurring techniques. Social media feedback indicates that developers have given high praise to the logic and controllability of this dataset, believing it provides a more reliable foundation for AI model training.

 The Continuous Evolution of the WebAgent Ecosystem: Open Source and Community-Driven

WebShaper is the latest achievement of Alibaba's Tongyi Lab in the WebAgent series, which also includes WebWalker, WebDancer, and WebSailor. These tools collectively aim to build autonomous information retrieval and processing capabilities, covering various scenarios such as academic research, market analysis, and daily queries. AIbase noticed that the WebAgent project has already received over 4,000 stars on GitHub, showing widespread attention and support from the open-source community.[] (https://www.kdjingpai.com/en/webagent/)

The open-source nature of WebShaper further promotes community innovation. Developers can freely access code and part of the datasets, and optimize model performance by adjusting hyperparameters or combining reinforcement learning optimization methods like DUPO. Additionally, WebAgent provides interactive demonstrations for tasks such as WebWalkerQA and GAIA, allowing users to experience the model's powerful features intuitively. AIbase expects that as the community continues to contribute, WebShaper and its related tools will demonstrate potential in more scenarios.

 Future Outlook: Driving AI Toward General Intelligence

The release of WebShaper marks an important advancement in the field of information retrieval. Its formal-driven paradigm offers new possibilities for AI to handle complex tasks. AIbase learned that Alibaba's Tongyi Lab plans to further expand the functions of the WebAgent series, such as optimizing multimodal processing capabilities, supporting a wider range of languages and scenarios, and even exploring deployment methods for remote access to high-performance models.

On social media, developers have generally given positive reviews of WebShaper, considering it "logically clear and performance excellent," especially in tasks requiring multi-step reasoning and cross-modal understanding. AIbase believes that WebShaper not only enhances the competitiveness of open-source models but also lays an important foundation for the development of artificial general intelligence (AGI).

Conclusion  

Ali Tongyi Lab's WebShaper, with its innovative formal-driven paradigm and outstanding performance on the GAIA benchmark, has redefined the boundaries of information retrieval tasks. AIbase will continue to track the latest developments of the WebAgent series and bring you more cutting-edge AI technology news. Let us witness together how open-source AI, driven by logical thinking and community collaboration, moves toward a new era of general intelligence!

Project Address: https://github.com/Alibaba-NLP/WebAgent