From Sufficiency to Reflection: Reinforcement-Guided Thinking Quality in Retrieval-Augmented Reasoning for LLMs

Abstract

Reinforcement learning-based retrieval-augmented generation (RAG) methodsenhance the reasoning abilities of large language models (LLMs). However, mostrely only on final-answer rewards, overlooking intermediate reasoning quality.This paper analyzes existing RAG reasoning models and identifies three mainfailure patterns: (1) information insufficiency, meaning the model fails toretrieve adequate support; (2) faulty reasoning, where logical or content-levelflaws appear despite sufficient information; and (3) answer-reasoninginconsistency, where a valid reasoning chain leads to a mismatched finalanswer. We propose TIRESRAG-R1, a novel framework using athink-retrieve-reflect process and a multi-dimensional reward system to improvereasoning and stability. TIRESRAG-R1 introduces: (1) a sufficiency reward toencourage thorough retrieval; (2) a reasoning quality reward to assess therationality and accuracy of the reasoning chain; and (3) a reflection reward todetect and revise errors. It also employs a difficulty-aware reweightingstrategy and training sample filtering to boost performance on complex tasks.Experiments on four multi-hop QA datasets show that TIRESRAG-R1 outperformsprior RAG methods and generalizes well to single-hop tasks. The code and dataare available at: https://github.com/probe2/TIRESRAG-R1.

Quick Read (beta)

loading the full paper ...