Abstract
Instruction tuning a large language model with multiple languages can prepareit for multilingual downstream tasks. Nonetheless, it is yet to be determinedwhether having a handful of languages is sufficient, or whether the benefitsincrease with the inclusion of more. By fine-tuning large multilingual modelson 1 to 52 languages, we present a case study on BLOOM to understand threepertinent factors affecting performance: the number of languages, languageexposure, and similarity between training and test languages. Overall we foundthat 1) expanding language coverage in multilingual instruction tuning provesto be beneficial; 2) accuracy often significantly boots if the test languageappears in the instruction mixture; 3) languages' genetic features correlatewith cross-lingual transfer more than merely the number of language butdifferent languages benefit to various degrees.