Unraveling Babel: Exploring Multilingual Activation Patterns within Large Language Models

Abstract

Recently, large language models (LLMs) have achieved tremendous breakthroughsin the field of language processing, yet their mechanisms in processingmultiple languages remain agnostic. Therefore, in this work we study themultilingual activation patterns of LLMs. By transforming the original LargeLanguage Models (LLMs) into a Mixture of Experts (MoE) architecture, we analyzethe expert activation patterns when processing various languages anddemonstrate the connections of these activation patterns at the level oflanguage families. We discover the existence of non-language-specific neuronsas well as language-specific activation neurons. Further exploration evenshowcases that merely leveraging high-frequency activation neurons canaccelerate inference while maintaining comparable performance. These findingsshed light on the LLMs' multilingual processing mechanism, and are ofsignificant importance in guiding the multilingual training and model pruningof LLMs.

Quick Read (beta)

loading the full paper ...