Knowing You Don't Know: Learning When to Continue Search in Multi-round RAG through Self-Practicing

  • 2025-05-05 18:39:35
  • Diji Yang, Linda Zeng, Jinmeng Rao, Yi Zhang
  • 0

Abstract

Retrieval Augmented Generation (RAG) has shown strong capability in enhancinglanguage models' knowledge and reducing AI generative hallucinations, drivingits widespread use. However, complex tasks requiring multi-round retrievalremain challenging, and early attempts tend to be overly optimistic without agood sense of self-skepticism. Current multi-round RAG systems may continuesearching even when enough information has already been retrieved, or they mayprovide incorrect answers without having sufficient information or knowledge.Existing solutions either require large amounts of expensive human-labeledprocess supervision data or lead to subpar performance. This paper aims to address these limitations by introducing a new framework,\textbf{SIM-RAG}, to explicitly enhance RAG systems' self-awareness andmulti-round retrieval capabilities. To train SIM-RAG, we first let a RAG systemself-practice multi-round retrieval, augmenting existing question-answer pairswith intermediate inner monologue reasoning steps to generate synthetictraining data. For each pair, the system may explore multiple retrieval paths,which are labeled as successful if they reach the correct answer andunsuccessful otherwise. Using this data, we train a lightweight informationsufficiency Critic. At inference time, the Critic evaluates whether the RAGsystem has retrieved sufficient information at each round, guiding retrievaldecisions and improving system-level self-awareness through in-contextreinforcement learning. Experiments across multiple prominent RAG benchmarks show that SIM-RAG is aneffective multi-round RAG solution. Furthermore, this framework issystem-efficient, adding a lightweight component to RAG without requiringmodifications to existing LLMs or search engines, and data-efficient,eliminating the need for costly human-annotated mid-step retrieval processsupervision data.

 

Quick Read (beta)

loading the full paper ...