fMRI predictors based on language models of increasing complexity recover brain left lateralization

Abstract

Over the past decade, studies of naturalistic language processing whereparticipants are scanned while listening to continuous text have flourished.Using word embeddings at first, then large language models, researchers havecreated encoding models to analyze the brain signals. Presenting these modelswith the same text as the participants allows to identify brain areas wherethere is a significant correlation between the functional magnetic resonanceimaging (fMRI) time series and the ones predicted by the models' artificialneurons. One intriguing finding from these studies is that they have revealedhighly symmetric bilateral activation patterns, somewhat at odds with thewell-known left lateralization of language processing. Here, we report analysesof an fMRI dataset where we manipulate the complexity of large language models,testing 28 pretrained models from 8 different families, ranging from 124M to14.2B parameters. First, we observe that the performance of models inpredicting brain responses follows a scaling law, where the fit with brainactivity increases linearly with the logarithm of the number of parameters ofthe model (and its performance on natural language processing tasks). Second,we show that a left-right asymmetry gradually appears as model size increases,and that the difference in left-right brain correlations also follows a scalinglaw. Whereas the smallest models show no asymmetry, larger models fit betterand better left hemispheric activations than right hemispheric ones. Thisfinding reconciles computational analyses of brain activity using largelanguage models with the classic observation from aphasic patients showing lefthemisphere dominance for language.

Quick Read (beta)

loading the full paper ...