Quantifying Multilingual Performance of Large Language Models Across Languages

Abstract

The training process of Large Language Models (LLMs) requires extensive textcorpus. However, these data are often unevenly distributed in differentlanguages. As a result, LLMs perform well on common languages, such as English,German, and French, but perform poorly on low-resource languages. However,currently there is no work to quantitatively measure the performance of LLMs inlow-resource languages. To fill this gap, we proposed the Language Ranker thataims to benchmark and rank different languages according to the performance ofLLMs on those languages. We employ the LLM's performance on the English corpusas a baseline to compare the performances of different languages and English.We have the following three findings: 1. The performance rankings of differentLLMs in all languages are roughly the same. 2. LLMs with different sizes havethe same partial order of performance. 3. There is a strong correlation betweenLlaMa2's performance in different languages and the proportion of thepre-training corpus. These findings illustrate that the Language Ranker can beused as an indicator to measure the language performance of LLMs.

Quick Read (beta)

loading the full paper ...