Abstract
Negation is a fundamental linguistic phenomenon that poses persistentchallenges for Large Language Models (LLMs), particularly in tasks requiringdeep semantic understanding. Existing benchmarks often treat negation as a sidecase within broader tasks like natural language inference, resulting in a lackof benchmarks that exclusively target negation understanding. In this work, weintroduce Thunder-NUBench, a novel benchmark explicitly designed to assesssentence-level negation understanding in LLMs. Thunder-NUBench goes beyondsurface-level cue detection by contrasting standard negation with structurallydiverse alternatives such as local negation, contradiction, and paraphrase. Thebenchmark consists of manually curated sentence-negation pairs and amultiple-choice dataset that enables in-depth evaluation of models' negationunderstanding.