Understanding language-elicited EEG data by predicting it from a fine-tuned language model

Abstract

Electroencephalography (EEG) recordings of brain activity taken whileparticipants read or listen to language are widely used within the cognitiveneuroscience and psycholinguistics communities as a tool to study languagecomprehension. Several time-locked stereotyped EEG responses toword-presentations -- known collectively as event-related potentials (ERPs) --are thought to be markers for semantic or syntactic processes that take placeduring comprehension. However, the characterization of each individual ERP interms of what features of a stream of language trigger the response remainscontroversial. Improving this characterization would make ERPs a more usefultool for studying language comprehension. We take a step towards betterunderstanding the ERPs by fine-tuning a language model to predict them. Thisnew approach to analysis shows for the first time that all of the ERPs arepredictable from embeddings of a stream of language. Prior work has only foundtwo of the ERPs to be predictable. In addition to this analysis, we examinewhich ERPs benefit from sharing parameters during joint training. We find thattwo pairs of ERPs previously identified in the literature as being related toeach other benefit from joint training, while several other pairs of ERPs thatbenefit from joint training are suggestive of potential relationships.Extensions of this analysis that further examine what kinds of information inthe model embeddings relate to each ERP have the potential to elucidate theprocesses involved in human language comprehension.

Quick Read (beta)

loading the full paper ...