ePiC: Employing Proverbs in Context as a Benchmark for Abstract Language Understanding

Abstract

While large language models have shown exciting progress on several NLPbenchmarks, evaluating their ability for complex analogical reasoning remainsunder-explored. Here, we introduce a high-quality crowdsourced dataset ofnarratives for employing proverbs in context as a benchmark for abstractlanguage understanding. The dataset provides fine-grained annotation of alignedspans between proverbs and narratives, and contains minimal lexical overlapsbetween narratives and proverbs, ensuring that models need to go beyondsurface-level reasoning to succeed. We explore three tasks: (1) proverbrecommendation and alignment prediction, (2) narrative generation for a givenproverb and topic, and (3) identifying narratives with similar motifs. Ourexperiments show that neural language models struggle in our tasks compared tohumans, and the tasks pose multiple learning challenges.

Quick Read (beta)

loading the full paper ...