The LLM Language Network: A Neuroscientific Approach for Identifying Causally Task-Relevant Units

Abstract

Large language models (LLMs) exhibit remarkable capabilities on not justlanguage tasks, but also various tasks that are not linguistic in nature, suchas logical reasoning and social inference. In the human brain, neuroscience hasidentified a core language system that selectively and causally supportslanguage processing. We here ask whether similar specialization for languageemerges in LLMs. We identify language-selective units within 18 popular LLMs,using the same localization approach that is used in neuroscience. We thenestablish the causal role of these units by demonstrating that ablating LLMlanguage-selective units -- but not random units -- leads to drastic deficitsin language tasks. Correspondingly, language-selective LLM units are morealigned to brain recordings from the human language system than random units.Finally, we investigate whether our localization method extends to othercognitive domains: while we find specialized networks in some LLMs forreasoning and social capabilities, there are substantial differences amongmodels. These findings provide functional and causal evidence forspecialization in large language models, and highlight parallels with thefunctional organization in the brain.

Quick Read (beta)

loading the full paper ...