Abstract
Detecting whether an LLM hallucinates is an important research challenge. Onepromising way of doing so is to estimate the semantic entropy (Farquhar et al.,2024) of the distribution of generated sequences. We propose a new algorithmfor doing that, with two main advantages. First, due to us taking the Bayesianapproach, we achieve a much better quality of semantic entropy estimates for agiven budget of samples from the LLM. Second, we are able to tune the number ofsamples adaptively so that `harder' contexts receive more samples. Wedemonstrate empirically that our approach systematically beats the baselines,requiring only 59% of samples used by Farquhar et al. (2024) to achieve thesame quality of hallucination detection as measured by AUROC. Moreover, quitecounterintuitively, our estimator is useful even with just one sample from theLLM.