ZeroSearch: Incentivize the Search Capability of LLMs without Searching

  • 2025-05-07 18:30:22
  • Hao Sun, Zile Qiao, Jiayan Guo, Xuanbo Fan, Yingyan Hou, Yong Jiang, Pengjun Xie, Fei Huang, Yan Zhang
  • 0

Abstract

Effective information searching is essential for enhancing the reasoning andgeneration capabilities of large language models (LLMs). Recent research hasexplored using reinforcement learning (RL) to improve LLMs' search capabilitiesby interacting with live search engines in real-world environments. While theseapproaches show promising results, they face two major challenges: (1)Uncontrolled Document Quality: The quality of documents returned by searchengines is often unpredictable, introducing noise and instability into thetraining process. (2) Prohibitively High API Costs: RL training requiresfrequent rollouts, potentially involving hundreds of thousands of searchrequests, which incur substantial API expenses and severely constrainscalability. To address these challenges, we introduce ZeroSearch, areinforcement learning framework that incentivizes the search capabilities ofLLMs without interacting with real search engines. Our approach begins withlightweight supervised fine-tuning to transform the LLM into a retrieval modulecapable of generating both relevant and noisy documents in response to a query.During RL training, we employ a curriculum-based rollout strategy thatincrementally degrades the quality of generated documents, progressivelyeliciting the model's reasoning ability by exposing it to increasinglychallenging retrieval scenarios. Extensive experiments demonstrate thatZeroSearch effectively incentivizes the search capabilities of LLMs using a 3BLLM as the retrieval module. Remarkably, a 7B retrieval module achievescomparable performance to the real search engine, while a 14B retrieval moduleeven surpasses it. Furthermore, it generalizes well across both base andinstruction-tuned models of various parameter sizes and is compatible with awide range of RL algorithms.

 

Quick Read (beta)

loading the full paper ...