Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts

Abstract

Large language models (LLMs) are known to effectively perform tasks by simplyobserving few exemplars. However, in low-resource languages, obtaining suchhand-picked exemplars can still be challenging, where unsupervised techniquesmay be necessary. Moreover, competent generative capabilities of LLMs areobserved only in high-resource languages, while their performances amongunder-represented languages fall behind due to pre-training data imbalance. Toelicit LLMs' ability onto low-resource languages without any supervised data,we propose to assemble synthetic exemplars from a diverse set of high-resourcelanguages to prompt the LLMs to translate from any language into English. Theseprompts are then used to create intra-lingual exemplars to perform tasks in thetarget languages. Our unsupervised prompting method performs on par withsupervised few-shot learning in LLMs of different sizes for translationsbetween English and 13 Indic and 21 African low-resource languages. We alsoshow that fine-tuning a 7B model on data generated from our method helps itperform competitively with a 175B model. In non-English translation tasks, ourmethod even outperforms supervised prompting by up to 3 chrF++ in manylow-resource languages. When evaluated on zero-shot multilingual summarization,our method surpasses other English-pivoting baselines by up to 4 ROUGE-L and isalso favored by GPT-4.

Quick Read (beta)

loading the full paper ...