Testing the Predictions of Surprisal Theory in 11 Languages

Abstract

A fundamental result in psycholinguistics is that less predictable words takea longer time to process. One theoretical explanation for this finding isSurprisal Theory (Hale, 2001; Levy, 2008), which quantifies a word'spredictability as its surprisal, i.e. its negative log-probability given acontext. While evidence supporting the predictions of Surprisal Theory havebeen replicated widely, most have focused on a very narrow slice of data:native English speakers reading English texts. Indeed, no comprehensivemultilingual analysis exists. We address this gap in the current literature byinvestigating the relationship between surprisal and reading times in elevendifferent languages, distributed across five language families. Derivingestimates from language models trained on monolingual and multilingual corpora,we test three predictions associated with surprisal theory: (i) whethersurprisal is predictive of reading times; (ii) whether expected surprisal, i.e.contextual entropy, is predictive of reading times; (iii) and whether thelinking function between surprisal and reading times is linear. We find thatall three predictions are borne out crosslinguistically. By focusing on a morediverse set of languages, we argue that these results offer the most robustlink to-date between information theory and incremental language processingacross languages.

Quick Read (beta)

loading the full paper ...