Ali WebShaper Released! GAIA Outperforms Claude 3.5 Sonnet and GPT-4o

AIbase基地

Published inAI News · 9 min read · Jul 31, 2025

Ali Tongyi Lab recently launched the fourth open-source tool in the WebAgent series—WebShaper. This groundbreaking framework has sparked industry discussions with its innovative "formal-driven" information retrieval paradigm. According to AIbase, which gathered information from social media and other channels, WebShaper not only achieved a high score of 60.19 on the GAIA benchmark test, surpassing Claude 3.5 Sonnet and GPT-4o, but also significantly improved AI's information retrieval and reasoning capabilities in complex tasks through a new data generation method.

From Information-Driven to Formal-Driven: A Breakthrough in Paradigm

Traditional information retrieval (IS) methods are often centered around "information-driven" approaches, but they frequently face issues such as misalignment between information structure and reasoning logic, and limited knowledge coverage, leading to inadequate performance when AI handles open-ended complex tasks. WebShaper introduces a new paradigm called "formal-driven," redefining the data generation and model training process through systematic task formalization methods.

The core of this framework lies in ensuring that the knowledge structure and reasoning structure of training data are semantically highly consistent through a logically clear structured generation approach. AIbase learned that WebShaper uses an "Agentic Expander" to iteratively generate and verify questions, ensuring the data generation process is controllable and well-organized. This method not only improves data quality but also significantly enhances the model's performance in complex information retrieval tasks.

GAIA Evaluation Achieves Excellent Results: 60.19 Points Leading Open-Source Models

WebShaper's performance is impressive. On the GAIA benchmark test, an open-source model trained on the WebShaper dataset achieved a high score of 60.19, surpassing the industry-leading Claude 3.5 Sonnet and GPT-4o, setting a new benchmark for open-source models. GAIA is a benchmark focused on evaluating AI's general capabilities, covering tasks such as multimodal processing, web browsing, and complex reasoning. Its high difficulty places strict requirements on AI's comprehensive abilities.

Additionally, WebShaper achieved an excellent score of 52.50 on the WebWalkerQA benchmark test, demonstrating its strong capabilities in web traversal and information retrieval tasks. AIbase believes that this achievement not only proves WebShaper's technological leadership but also injects new vitality into the open-source AI community.

WebShaper Dataset: A New Training Paradigm Driven by Logic

One of the core innovations of WebShaper is its dataset generation framework. Unlike traditional chaotic data collection methods, WebShaper systematically generates information retrieval task instances through a formal-driven approach. AIbase learned that this framework can generate structured training data according to task requirements, ensuring semantic consistency between knowledge and reasoning logic, thus enabling AI to perform more accurately and efficiently when handling open-ended questions.

For example, WebShaper introduced the SailorFog-QA dataset, a high uncertainty and high difficulty question-answering benchmark designed to test model performance in complex scenarios. It is generated using graph sampling and information blurring techniques. Social media feedback indicates that developers have given high praise to the logic and controllability of this dataset, believing it provides a more reliable foundation for AI model training.

The Continuous Evolution of the WebAgent Ecosystem: Open Source and Community-Driven

WebShaper is the latest achievement of Alibaba's Tongyi Lab in the WebAgent series, which also includes WebWalker, WebDancer, and WebSailor. These tools collectively aim to build autonomous information retrieval and processing capabilities, covering various scenarios such as academic research, market analysis, and daily queries. AIbase noticed that the WebAgent project has already received over 4,000 stars on GitHub, showing widespread attention and support from the open-source community.[] (https://www.kdjingpai.com/en/webagent/)

The open-source nature of WebShaper further promotes community innovation. Developers can freely access code and part of the datasets, and optimize model performance by adjusting hyperparameters or combining reinforcement learning optimization methods like DUPO. Additionally, WebAgent provides interactive demonstrations for tasks such as WebWalkerQA and GAIA, allowing users to experience the model's powerful features intuitively. AIbase expects that as the community continues to contribute, WebShaper and its related tools will demonstrate potential in more scenarios.

Future Outlook: Driving AI Toward General Intelligence

The release of WebShaper marks an important advancement in the field of information retrieval. Its formal-driven paradigm offers new possibilities for AI to handle complex tasks. AIbase learned that Alibaba's Tongyi Lab plans to further expand the functions of the WebAgent series, such as optimizing multimodal processing capabilities, supporting a wider range of languages and scenarios, and even exploring deployment methods for remote access to high-performance models.

On social media, developers have generally given positive reviews of WebShaper, considering it "logically clear and performance excellent," especially in tasks requiring multi-step reasoning and cross-modal understanding. AIbase believes that WebShaper not only enhances the competitiveness of open-source models but also lays an important foundation for the development of artificial general intelligence (AGI).

Conclusion

Ali Tongyi Lab's WebShaper, with its innovative formal-driven paradigm and outstanding performance on the GAIA benchmark, has redefined the boundaries of information retrieval tasks. AIbase will continue to track the latest developments of the WebAgent series and bring you more cutting-edge AI technology news. Let us witness together how open-source AI, driven by logical thinking and community collaboration, moves toward a new era of general intelligence!

Project Address: https://github.com/Alibaba-NLP/WebAgent

Learning from OpenAI: Real-time Audio and Video Infrastructure Company LiveKit Secures $100 Million in Funding, Valuation Surpasses $1 Billion

LiveKit, a startup specializing in real-time interaction infrastructure for generative AI, has raised $100 million at a $1 billion valuation, becoming a unicorn. The funding round was led by Index Ventures with participation from Altimeter Capital Management, Hanabi Capital, and Redpoint Ventures.....

New Way to Explore Exhibitions: Doubao's AI Video Call Guide Accurately Identifies Similar Artifacts

ByteDance's AI assistant, Doubao, has partnered with the Shanghai Pudong Art Museum and become the official AI guide for two international exhibitions, marking the first time an AI product has been officially involved in museum tours. Through the video call function, Doubao can identify exhibits and provide explanations, solving the problem of 'face blindness' for visitors, signifying the deep implementation of the 'AI+Art' experience.

Zhiyuan Robotics Spins Off Core Component Department to Accelerate the Commercialization of Dexterous Hands

Zhiyuan Robotics has spun off its core component department into an independent subsidiary called 'Critical Point,' led by Xiong Kun, the former leader of the dexterous hand business. This move signifies the company's shift from integrated system development to deepening its core supply chain, marking a new phase in the research and development of core components for humanoid robots.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Brand Visibility

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

Ali WebShaper Released! GAIA Outperforms Claude 3.5 Sonnet and GPT-4o

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Ant Group Restructures Organization: Establishes a Large Model Technology Innovation Department, Focused on To B Commercialization

CEO of Indian IT giant Tata Consultancy speaks out: AI will not lead to mass unemployment

Refuse to Be Just a Messenger! Apple iOS 27 Redesigns Siri: Evolves into a Comprehensive AI Chatbot

Learning from OpenAI: Real-time Audio and Video Infrastructure Company LiveKit Secures $100 Million in Funding, Valuation Surpasses $1 Billion

Medeo AI New Version Launched Overseas: One-Click Script Modification with Natural Language, Everyone Can Shoot High-Quality Videos

New Way to Explore Exhibitions: Doubao's AI Video Call Guide Accurately Identifies Similar Artifacts

Musk Announces the Official Open Sourcing of the X Platform's Recommendation Algorithm: Based on the Grok Model Architecture, Updated Every Four Weeks

Yushu Technology Obtains a Design Patent for Humanoid Robots, Registered Capital Increased to 360 Million Yuan

Zhiyuan Robotics Spins Off Core Component Department to Accelerate the Commercialization of Dexterous Hands

Search is Decision! Google Gemini3Pro Deeply Integrated AI Overview, Perfect for Brain-Straining Complex Queries

AI News Recommendations

Ant Group Restructures Organization: Establishes a Large Model Technology Innovation Department, Focused on To B Commercialization

CEO of Indian IT giant Tata Consultancy speaks out: AI will not lead to mass unemployment

Refuse to Be Just a Messenger! Apple iOS 27 Redesigns Siri: Evolves into a Comprehensive AI Chatbot

Learning from OpenAI: Real-time Audio and Video Infrastructure Company LiveKit Secures $100 Million in Funding, Valuation Surpasses $1 Billion

Medeo AI New Version Launched Overseas: One-Click Script Modification with Natural Language, Everyone Can Shoot High-Quality Videos

New Way to Explore Exhibitions: Doubao's AI Video Call Guide Accurately Identifies Similar Artifacts

Musk Announces the Official Open Sourcing of the X Platform's Recommendation Algorithm: Based on the Grok Model Architecture, Updated Every Four Weeks

Yushu Technology Obtains a Design Patent for Humanoid Robots, Registered Capital Increased to 360 Million Yuan

Zhiyuan Robotics Spins Off Core Component Department to Accelerate the Commercialization of Dexterous Hands

Search is Decision! Google Gemini3Pro Deeply Integrated AI Overview, Perfect for Brain-Straining Complex Queries

GEO Services