With the rapid development of the Internet, the explosive growth of information has brought many challenges to human information retrieval. To address these challenges, the Tongyi Lab of Alibaba has launched an innovative open-source AI agent framework called WebSailor. The framework has received over 5000 stars on GitHub for its outstanding performance, especially in handling complex tasks, and has become one of the projects with the highest daily growth rate.

image.png

Outstanding Performance of WebSailor

The development team of WebSailor verified its excellent performance through multiple benchmark tests. In the BrowseComp-en/zh test, WebSailor outperformed all existing open-source agents and even rivaled some closed-source models. Additionally, in the SimpleQA benchmark test, WebSailor also demonstrated its superiority in handling simple tasks.

Combining Complex Task Generation with Reinforcement Learning

The core technology of WebSailor mainly focuses on two modules: complex task generation and reinforcement learning. These two modules complement each other, enabling WebSailor to demonstrate higher efficiency when handling complex information retrieval tasks.

Complex Task Generation: To simulate the real-world information environment, the research team built complex knowledge graphs. These graphs are generated through random walks, reflecting a high degree of non-linearity and complexity. Each node represents an entity, and the edges show relationships between entities, forming diverse combinations, providing a foundation for generating high-uncertainty tasks.

Reinforcement Learning Module: The goal of reinforcement learning is to optimize the model's behavior strategy through interaction with the environment. WebSailor adopts a two-stage training method, first using a rejection sampling fine-tuning stage (RFT) to cold-start the model, then entering the reinforcement learning phase. During this process, the research team also introduced a dynamic sampling strategy to optimize the training process, proposing the DUPO algorithm, which enables the model to achieve higher performance with fewer samples.

image.png

Innovative Methods to Enhance Task Complexity

To further increase task complexity, the research team introduced information fuzzification technology when generating question-answer pairs. This technology replaces precise information with vague descriptions, making questions more challenging and requiring the model to perform more complex reasoning and information synthesis. This innovative approach not only increases the difficulty of the tasks but also enhances the intelligence level of the model.

With the release of WebSailor, Alibaba has taken another step forward in the field of artificial intelligence. The open-source nature not only helps the popularization and development of technology but also provides developers with more exploration space and practical opportunities. In the future, WebSailor is expected to show greater potential in areas such as information retrieval and intelligent Q&A.

Open source address: https://github.com/Alibaba-NLP/WebAgent