Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes

Abstract

Humour, as a complex language form, is derived from myriad aspects of life,whilst existing work on computational humour has focussed almost exclusively onshort pun-based jokes. In this work, we investigate whether the ability ofLarge Language Models (LLMs) to explain humour depends on the particular humourform. We compare models on simple puns and more complex topical humour thatrequires knowledge of real-world entities and events. In doing so, we curate adataset of 600 jokes split across 4 joke types and manually write high-qualityexplanations. These jokes include heterographic and homographic puns,contemporary internet humour, and topical jokes, where understanding relies onreasoning beyond "common sense", rooted instead in world knowledge regardingnews events and pop culture. Using this dataset, we compare the zero-shotabilities of a range of LLMs to accurately and comprehensively explain jokes ofdifferent types, identifying key research gaps in the task of humourexplanation. We find that none of the tested models (inc. reasoning models) arecapable of reliably generating adequate explanations of all joke types, furtherhighlighting the narrow focus of most works in computational humour on overlysimple joke forms.

Quick Read (beta)

loading the full paper ...