Can Language Models Be Tricked by Language Illusions? Easier with Syntax, Harder with Semantics

Abstract

Language models (LMs) have been argued to overlap substantially with humanbeings in grammaticality judgment tasks. But when humans systematically makeerrors in language processing, should we expect LMs to behave like cognitivemodels of language and mimic human behavior? We answer this question byinvestigating LMs' more subtle judgments associated with "language illusions"-- sentences that are vague in meaning, implausible, or ungrammatical butreceive unexpectedly high acceptability judgments by humans. We looked at threeillusions: the comparative illusion (e.g. "More people have been to Russia thanI have"), the depth-charge illusion (e.g. "No head injury is too trivial to beignored"), and the negative polarity item (NPI) illusion (e.g. "The hunter whono villager believed to be trustworthy will ever shoot a bear"). We found thatprobabilities represented by LMs were more likely to align with human judgmentsof being "tricked" by the NPI illusion which examines a structural dependency,compared to the comparative and the depth-charge illusions which requiresophisticated semantic understanding. No single LM or metric yielded resultsthat are entirely consistent with human behavior. Ultimately, we show that LMsare limited both in their construal as cognitive models of human languageprocessing and in their capacity to recognize nuanced but critical informationin complicated language materials.

Quick Read (beta)

loading the full paper ...