First Hallucination Tokens Are Different from Conditional Ones

Abstract

Hallucination, the generation of untruthful content, is one of the majorconcerns regarding foundational models. Detecting hallucinations at the tokenlevel is vital for real-time filtering and targeted correction, yet thevariation of hallucination signals within token sequences is not fullyunderstood. Leveraging the RAGTruth corpus with token-level annotations andreproduced logits, we analyse how these signals depend on a token's positionwithin hallucinated spans, contributing to an improved understanding oftoken-level hallucination. Our results show that the first hallucinated tokencarries a stronger signal and is more detectable than conditional tokens. Werelease our analysis framework, along with code for logit reproduction andmetric computation at https://github.com/jakobsnl/RAGTruth\_Xtended.

Quick Read (beta)

loading the full paper ...