Do Large Language Models Mirror Cognitive Language Processing?

Abstract

Large Language Models (LLMs) have demonstrated remarkable abilities in textcomprehension and logical reasoning, indicating that the text representationslearned by LLMs can facilitate their language processing capabilities. Incognitive science, brain cognitive processing signals are typically utilized tostudy human language processing. Therefore, it is natural to ask how well thetext embeddings from LLMs align with the brain cognitive processing signals,and how training strategies affect the LLM-brain alignment? In this paper, weemploy Representational Similarity Analysis (RSA) to measure the alignmentbetween 23 mainstream LLMs and fMRI signals of the brain to evaluate howeffectively LLMs simulate cognitive language processing. We empiricallyinvestigate the impact of various factors (e.g., pre-training data size, modelscaling, alignment training, and prompts) on such LLM-brain alignment.Experimental results indicate that pre-training data size and model scaling arepositively correlated with LLM-brain similarity, and alignment training cansignificantly improve LLM-brain similarity. Explicit prompts contribute to theconsistency of LLMs with brain cognitive language processing, while nonsensicalnoisy prompts may attenuate such alignment. Additionally, the performance of awide range of LLM evaluations (e.g., MMLU, Chatbot Arena) is highly correlatedwith the LLM-brain similarity.

Quick Read (beta)

loading the full paper ...