SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

Abstract

Generative Large Language Models (LLMs) such as GPT-3 are capable ofgenerating highly fluent responses to a wide variety of user prompts. However,LLMs are known to hallucinate facts and make non-factual statements which canundermine trust in their output. Existing fact-checking approaches eitherrequire access to the output probability distribution (which may not beavailable for systems such as ChatGPT) or external databases that areinterfaced via separate, often complex, modules. In this work, we propose"SelfCheckGPT", a simple sampling-based approach that can be used to fact-checkthe responses of black-box models in a zero-resource fashion, i.e. without anexternal database. SelfCheckGPT leverages the simple idea that if an LLM hasknowledge of a given concept, sampled responses are likely to be similar andcontain consistent facts. However, for hallucinated facts, stochasticallysampled responses are likely to diverge and contradict one another. Weinvestigate this approach by using GPT-3 to generate passages about individualsfrom the WikiBio dataset, and manually annotate the factuality of the generatedpassages. We demonstrate that SelfCheckGPT can: i) detect non-factual andfactual sentences; and ii) rank passages in terms of factuality. We compare ourapproach to several baselines and show that our approach has considerablyhigher AUC-PR scores in sentence-level hallucination detection and highercorrelation scores in passage-level factuality assessment compared to grey-boxmethods.

Quick Read (beta)

loading the full paper ...