GlobalRAG: Enhancing Global Reasoning in Multi-hop Question Answering via Reinforcement Learning

Abstract

Reinforcement learning has recently shown promise in improvingretrieval-augmented generation (RAG). Despite these advances, its effectivenessin multi-hop question answering (QA) remains limited by two fundamentallimitations: (i) global planning absence to structure multi-step reasoning, and(ii) unfaithful execution, which hinders effective query formulation andconsistent use of retrieved evidence. We propose GlobalRAG, a reinforcementlearning framework designed to enhance global reasoning in multi-hop QA.GlobalRAG decomposes questions into subgoals, coordinates retrieval withreasoning, and refines evidence iteratively. To guide this process, weintroduce Planning Quality Reward and SubGoal Completion Reward, whichencourage coherent planning and reliable subgoal execution. In addition, aprogressive weight annealing strategy balances process-oriented andoutcome-based objectives. Extensive experiments on both in-domain andout-of-domain benchmarks demonstrate that GlobalRAG significantly outperformsstrong baselines while using only 8k training data (42% of the training dataused by strong baselines), achieving average improvements of 14.2% in both EMand F1.

Quick Read (beta)

loading the full paper ...