Abstract
Representations from transformer-based unidirectional language models areknown to be effective at predicting brain responses to natural language.However, most studies comparing language models to brains have used GPT-2 orsimilarly sized language models. Here we tested whether larger open-sourcemodels such as those from the OPT and LLaMA families are better at predictingbrain responses recorded using fMRI. Mirroring scaling results from othercontexts, we found that brain prediction performance scales logarithmicallywith model size from 125M to 30B parameter models, with ~15% increased encodingperformance as measured by correlation with a held-out test set across 3subjects. Similar logarithmic behavior was observed when scaling the size ofthe fMRI training set. We also characterized scaling for acoustic encodingmodels that use HuBERT, WavLM, and Whisper, and we found comparableimprovements with model size. A noise ceiling analysis of these large,high-performance encoding models showed that performance is nearing thetheoretical maximum for brain areas such as the precuneus and higher auditorycortex. These results suggest that increasing scale in both models and datawill yield incredibly effective models of language processing in the brain,enabling better scientific understanding as well as applications such asdecoding.