Abstract
Retrieval-Augmented Generation (RAG) integrates external knowledge with LargeLanguage Models (LLMs) to enhance factual correctness and mitigatehallucination. However, dense retrievers often become the bottleneck of RAGsystems due to their limited parameters compared to LLMs and their inability toperform step-by-step reasoning. While prompt-based iterative RAG attempts toaddress these limitations, it is constrained by human-designed workflows. Toaddress these limitations, we propose $\textbf{R3-RAG}$, which uses$\textbf{R}$einforcement learning to make the LLM learn how to$\textbf{R}$eason and $\textbf{R}$etrieve step by step, thus retrievingcomprehensive external knowledge and leading to correct answers. R3-RAG isdivided into two stages. We first use cold start to make the model learn themanner of iteratively interleaving reasoning and retrieval. Then we usereinforcement learning to further harness its ability to better explore theexternal retrieval environment. Specifically, we propose two rewards forR3-RAG: 1) answer correctness for outcome reward, which judges whether thetrajectory leads to a correct answer; 2) relevance-based document verificationfor process reward, encouraging the model to retrieve documents that arerelevant to the user question, through which we can let the model learn how toiteratively reason and retrieve relevant documents to get the correct answer.Experimental results show that R3-RAG significantly outperforms baselines andcan transfer well to different retrievers. We release R3-RAG athttps://github.com/Yuan-Li-FNLP/R3-RAG.