Learning to Reason for Hallucination Span Detection

Abstract

Large language models (LLMs) often generate hallucinations -- unsupportedcontent that undermines reliability. While most prior works frame hallucinationdetection as a binary task, many real-world applications require identifyinghallucinated spans, which is a multi-step decision making process. Thisnaturally raises the question of whether explicit reasoning can help thecomplex task of detecting hallucination spans. To answer this question, wefirst evaluate pretrained models with and without Chain-of-Thought (CoT)reasoning, and show that CoT reasoning has the potential to generate at leastone correct answer when sampled multiple times. Motivated by this, we proposeRL4HS, a reinforcement learning framework that incentivizes reasoning with aspan-level reward function. RL4HS builds on Group Relative Policy Optimizationand introduces Class-Aware Policy Optimization to mitigate reward imbalanceissue. Experiments on the RAGTruth benchmark (summarization, questionanswering, data-to-text) show that RL4HS surpasses pretrained reasoning modelsand supervised fine-tuning, demonstrating the necessity of reinforcementlearning with span-level rewards for detecting hallucination spans.

Quick Read (beta)

loading the full paper ...