R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning

Abstract

Large Language Models (LLMs) are powerful but prone to hallucinations due tostatic knowledge. Retrieval-Augmented Generation (RAG) helps by injectingexternal information, but current methods often are costly, generalize poorly,or ignore the internal knowledge of the model. In this paper, we introduceR1-Searcher++, a novel framework designed to train LLMs to adaptively leverageboth internal and external knowledge sources. R1-Searcher++ employs a two-stagetraining strategy: an initial SFT Cold-start phase for preliminary formatlearning, followed by RL for Dynamic Knowledge Acquisition. The RL stage usesoutcome-supervision to encourage exploration, incorporates a reward mechanismfor internal knowledge utilization, and integrates a memorization mechanism tocontinuously assimilate retrieved information, thereby enriching the model'sinternal knowledge. By leveraging internal knowledge and external searchengine, the model continuously improves its capabilities, enabling efficientretrieval-augmented reasoning. Our experiments demonstrate that R1-Searcher++outperforms previous RAG and reasoning methods and achieves efficientretrieval. The code is available athttps://github.com/RUCAIBox/R1-Searcher-plus.

Quick Read (beta)

loading the full paper ...