Recently, the research team from Tongyi Lab and Peking University has introduced an innovative framework called ZeroSearch. This new technology can activate the retrieval capabilities of large language models without real search, reducing training costs by an impressive 88%. This breakthrough provides a completely new approach for the training and application of large language models.

Traditional training methods usually rely on real search engines to obtain information, which not only incurs high API call costs but may also affect model performance due to the instability of search result quality. ZeroSearch ingeniously introduces a large language model as a "simulated search engine," using its rich knowledge accumulated during the pre-training process to generate retrieval documents, thus avoiding the costs and noise interference brought by real searches.

image.png

Paper address: https://arxiv.org/pdf/2505.04588

Code address: https://github.com/Alibaba-NLP/ZeroSearch

Project homepage: https://alibaba-nlp.github.io/ZeroSearch

Huggingface homepage: https://huggingface.co/collections/sunhaonlp/zerosearch-v2-6827f4ee6b6265069d443d4e

This framework adopts a structured training template, enabling the model to think and operate in an orderly manner in each interaction. This method not only enhances the clarity of the model's reasoning path but also makes the extraction of final answers simpler. Additionally, ZeroSearch improves the quality of generated documents through a strategy called "simulation fine-tuning," ensuring the practicality and reliability of the output content.

image.png

In experiments, ZeroSearch significantly outperformed traditional methods that depend on real search engines, demonstrating strong generalization and stability. As the model parameter size increases, its performance continues to improve. This study not only advances the technological progress of large language models but also opens up new possibilities for future intelligent search and information retrieval applications.

In summary, ZeroSearch brings revolutionary changes to the training of large language models, promising a cost-effective and highly efficient future for intelligent information retrieval.