Abstract
Recent advances in large language model (LLM) pruning have shownstate-of-the-art compression results in post-training and retraining-freesettings while maintaining high predictive performance. However, such researchmainly considers calibrating pruning using English text, despite themultilingual nature of modern LLMs and their frequent uses in non-Englishlanguages. In this paper, we set out to explore effective strategies forcalibrating the pruning of multilingual language models. We present the firstcomprehensive empirical study, comparing different calibration languages forpruning multilingual models across diverse tasks, models, and state-of-the-artpruning techniques. Our results present practical suggestions, for example,calibrating in the target language can efficiently yield lower perplexity, butdoes not necessarily benefit downstream tasks. Our further analysis experimentsunveil that calibration in the target language mainly contributes to preservinglanguage-specific features related to fluency and coherence, but might notcontribute to capturing language-agnostic features such as languageunderstanding and reasoning. Last, we provide practical recommendations forfuture practitioners.