Abstract
Large Language Models (LLMs) have ushered in a new era in Natural LanguageProcessing, but their massive size demands effective compression techniques forpracticality. Although numerous model compression techniques have beeninvestigated, they typically rely on a calibration set that overlooks themultilingual context and results in significant accuracy degradation forlow-resource languages. This paper introduces Multilingual Brain Surgeon (MBS),a novel calibration data sampling method for multilingual LLMs compression. MBSovercomes the English-centric limitations of existing methods by samplingcalibration data from various languages proportionally to the languagedistribution of the model training datasets. Our experiments, conducted on theBLOOM multilingual LLM, demonstrate that MBS improves the performance ofexisting English-centric compression methods, especially for low-resourcelanguages. We also uncover the dynamics of language interaction duringcompression, revealing that the larger the proportion of a language in thetraining set and the more similar the language is to the calibration language,the better performance the language retains after compression. In conclusion,MBS presents an innovative approach to compressing multilingual LLMs,addressing the performance disparities and improving the language inclusivityof existing compression techniques.