Abstract
Modern language models can process inputs across diverse languages andmodalities. We hypothesize that models acquire this capability through learninga shared representation space across heterogeneous data types (e.g., differentlanguages and modalities), which places semantically similar inputs near oneanother, even if they are from different modalities/languages. We term this thesemantic hub hypothesis, following the hub-and-spoke model from neuroscience(Patterson et al., 2007) which posits that semantic knowledge in the humanbrain is organized through a transmodal semantic "hub" which integratesinformation from various modality-specific "spokes" regions. We first show thatmodel representations for semantically equivalent inputs in different languagesare similar in the intermediate layers, and that this space can be interpretedusing the model's dominant pretraining language via the logit lens. Thistendency extends to other data types, including arithmetic expressions, code,and visual/audio inputs. Interventions in the shared representation space inone data type also predictably affect model outputs in other data types,suggesting that this shared representations space is not simply a vestigialbyproduct of large-scale training on broad data, but something that is activelyutilized by the model during input processing.