ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning

Abstract

Conversational search systems require effective handling of context-dependentqueries that often contain ambiguity, omission, and coreference. ConversationalQuery Reformulation (CQR) addresses this challenge by transforming thesequeries into self-contained forms suitable for off-the-shelf retrievers.However, existing CQR approaches suffer from two critical constraints: highdependency on costly external supervision from human annotations or largelanguage models, and insufficient alignment between the rewriting model anddownstream retrievers. We present ConvSearch-R1, the first self-drivenframework that completely eliminates dependency on external rewrite supervisionby leveraging reinforcement learning to optimize reformulation directly throughretrieval signals. Our novel two-stage approach combines Self-Driven PolicyWarm-Up to address the cold-start problem through retrieval-guidedself-distillation, followed by Retrieval-Guided Reinforcement Learning with aspecially designed rank-incentive reward shaping mechanism that addresses thesparsity issue in conventional retrieval metrics. Extensive experiments onTopiOCQA and QReCC datasets demonstrate that ConvSearch-R1 significantlyoutperforms previous state-of-the-art methods, achieving over 10% improvementon the challenging TopiOCQA dataset while using smaller 3B parameter modelswithout any external supervision.

Quick Read (beta)

loading the full paper ...