MIRAGE-Bench: LLM Agent is Hallucinating and Where to Find Them

Abstract

Hallucinations pose critical risks for large language model (LLM)-basedagents, often manifesting as hallucinative actions resulting from fabricated ormisinterpreted information within the cognitive context. While recent studieshave exposed such failures, existing evaluations remain fragmented and lack aprincipled testbed. In this paper, we present MIRAGE-Bench--Measuring Illusionsin Risky AGEnt settings--the first unified benchmark for eliciting andevaluating hallucinations in interactive LLM-agent scenarios. We begin byintroducing a three-part taxonomy to address agentic hallucinations: actionsthat are unfaithful to (i) task instructions, (ii) execution history, or (iii)environment observations. To analyze, we first elicit such failures byperforming a systematic audit of existing agent benchmarks, then synthesizetest cases using a snapshot strategy that isolates decision points indeterministic and reproducible manners. To evaluate hallucination behaviors, weadopt a fine-grained-level LLM-as-a-Judge paradigm with tailored risk-awareprompts, enabling scalable, high-fidelity assessment of agent actions withoutenumerating full action spaces. MIRAGE-Bench provides actionable insights onfailure modes of LLM agents and lays the groundwork for principled progress inmitigating hallucinations in interactive environments.

Quick Read (beta)

loading the full paper ...