Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps

Abstract

When asked to summarize articles or answer questions given a passage, largelanguage models (LLMs) can hallucinate details and respond with unsubstantiatedanswers that are inaccurate with respect to the input context. This paperdescribes a simple approach for detecting such contextual hallucinations. Wehypothesize that contextual hallucinations are related to the extent to whichan LLM attends to information in the provided context versus its owngenerations. Based on this intuition, we propose a simple hallucinationdetection model whose input features are given by the ratio of attentionweights on the context versus newly generated tokens (for each attention head).We find that a linear classifier based on these lookback ratio features is aseffective as a richer detector that utilizes the entire hidden states of an LLMor a text-based entailment model. The lookback ratio-based detector -- LookbackLens -- is found to transfer across tasks and even models, allowing a detectorthat is trained on a 7B model to be applied (without retraining) to a larger13B model. We further apply this detector to mitigate contextualhallucinations, and find that a simple classifier-guided decoding approach isable to reduce the amount of hallucination, for example by 9.6% in the XSumsummarization task.

Quick Read (beta)

loading the full paper ...