Plan Then Retrieve: Reinforcement Learning-Guided Complex Reasoning over Knowledge Graphs

Abstract

Knowledge Graph Question Answering aims to answer natural language questionsby reasoning over structured knowledge graphs. While large language models haveadvanced KGQA through their strong reasoning capabilities, existing methodscontinue to struggle to fully exploit both the rich knowledge encoded in KGsand the reasoning capabilities of LLMs, particularly in complex scenarios. Theyoften assume complete KG coverage and lack mechanisms to judge when externalinformation is needed, and their reasoning remains locally myopic, failing tomaintain coherent multi-step planning, leading to reasoning failures even whenrelevant knowledge exists. We propose Graph-RFT, a novel two-stagereinforcement fine-tuning KGQA framework with a'plan-KGsearch-and-Websearch-during-think' paradigm, that enables LLMs toperform autonomous planning and adaptive retrieval scheduling across KG and websources under incomplete knowledge conditions. Graph-RFT introduces achain-of-thought fine-tuning method with a customized plan-retrieval datasetactivates structured reasoning and resolves the GRPO cold-start problem. Itthen introduces a novel plan-retrieval guided reinforcement learning processintegrates explicit planning and retrieval actions with a multi-reward design,enabling coverage-aware retrieval scheduling. It employs a Cartesian-inspiredplanning module to decompose complex questions into ordered subquestions, andlogical expression to guide tool invocation for globally consistent multi-stepreasoning. This reasoning retrieval process is optimized with a multi-rewardcombining outcome and retrieval specific signals, enabling the model to learnwhen and how to combine KG and web retrieval effectively.

Quick Read (beta)

loading the full paper ...