From Language to Language-ish: How Brain-Like is an LSTM's Representation of Nonsensical Language Stimuli?

Abstract

The representations generated by many models of language (word embeddings,recurrent neural networks and transformers) correlate to brain activityrecorded while people read. However, these decoding results are usually basedon the brain's reaction to syntactically and semantically sound languagestimuli. In this study, we asked: how does an LSTM (long short term memory)language model, trained (by and large) on semantically and syntactically intactlanguage, represent a language sample with degraded semantic or syntacticinformation? Does the LSTM representation still resemble the brain's reaction?We found that, even for some kinds of nonsensical language, there is astatistically significant relationship between the brain's activity and therepresentations of an LSTM. This indicates that, at least in some instances,LSTMs and the human brain handle nonsensical data similarly.

Quick Read (beta)

loading the full paper ...