VADER: Towards Causal Video Anomaly Understanding with Relation-Aware Large Language Models

Abstract

Video anomaly understanding (VAU) aims to provide detailed interpretation andsemantic comprehension of anomalous events within videos, addressinglimitations of traditional methods that focus solely on detecting andlocalizing anomalies. However, existing approaches often neglect the deepercausal relationships and interactions between objects, which are critical forunderstanding anomalous behaviors. In this paper, we propose VADER, anLLM-driven framework for Video Anomaly unDErstanding, which integrates keyframeobject Relation features with visual cues to enhance anomaly comprehension fromvideo. Specifically, VADER first applies an Anomaly Scorer to assign per-frameanomaly scores, followed by a Context-AwarE Sampling (CAES) strategy to capturethe causal context of each anomalous event. A Relation Feature Extractor and aCOntrastive Relation Encoder (CORE) jointly model dynamic object interactions,producing compact relational representations for downstream reasoning. Thesevisual and relational cues are integrated with LLMs to generate detailed,causally grounded descriptions and support robust anomaly-related questionanswering. Experiments on multiple real-world VAU benchmarks demonstrate thatVADER achieves strong results across anomaly description, explanation, andcausal reasoning tasks, advancing the frontier of explainable video anomalyanalysis.

Quick Read (beta)

loading the full paper ...