WebSeer: Training Deeper Search Agents through Reinforcement Learning with Self-Reflection

  • 2025-10-21 16:52:00
  • Guanzhong He, Zhen Yang, Jinxin Liu, Bin Xu, Lei Hou, Juanzi Li
  • 0

Abstract

Search agents have achieved significant advancements in enabling intelligentinformation retrieval and decision-making within interactive environments.Although reinforcement learning has been employed to train agentic modelscapable of more dynamic interactive retrieval, existing methods are limited byshallow tool-use depth and the accumulation of errors over multiple iterativeinteractions. In this paper, we present WebSeer, a more intelligent searchagent trained via reinforcement learning enhanced with a self-reflectionmechanism. Specifically, we construct a large dataset annotated with reflectionpatterns and design a two-stage training framework that unifies cold start andreinforcement learning within the self-reflection paradigm for real-worldweb-based environments, which enables the model to generate longer and morereflective tool-use trajectories. Our approach substantially extends tool-usechains and improves answer accuracy. Using a single 14B model, we achievestate-of-the-art results on HotpotQA and SimpleQA, with accuracies of 72.3% and90.0%, respectively, and demonstrate strong generalization toout-of-distribution datasets. The code is available athttps://github.com/99hgz/WebSeer

 

Quick Read (beta)

loading the full paper ...