Abstract
Hallucination is a well-known phenomenon in text generated by large languagemodels (LLMs). The existence of hallucinatory responses is found in almost allapplication scenarios e.g., summarization, question-answering (QA) etc. Forapplications requiring high reliability (e.g., customer-facing assistants), thepotential existence of hallucination in LLM-generated text is a criticalproblem. The amount of hallucination can be reduced by leveraging informationretrieval to provide relevant background information to the LLM. However, LLMscan still generate hallucinatory content for various reasons (e.g.,prioritizing its parametric knowledge over the context, failure to capture therelevant information from the context, etc.). Detecting hallucinations throughautomated methods is thus paramount. To facilitate research in this direction,we introduce a sophisticated dataset, DelucionQA, that captures hallucinationsmade by retrieval-augmented LLMs for a domain-specific QA task. Furthermore, wepropose a set of hallucination detection methods to serve as baselines forfuture works from the research community. Analysis and case study are alsoprovided to share valuable insights on hallucination phenomena in the targetscenario.