Language verY Rare for All

Abstract

In the quest to overcome language barriers, encoder-decoder models like NLLBhave expanded machine translation to rare languages, with some models (e.g.,NLLB 1.3B) even trainable on a single GPU. While general-purpose LLMs performwell in translation, open LLMs prove highly competitive when fine-tuned forspecific tasks involving unknown corpora. We introduce LYRA (Language verY Rarefor All), a novel approach that combines open LLM fine-tuning,retrieval-augmented generation (RAG), and transfer learning from relatedhigh-resource languages. This study is exclusively focused on single-GPUtraining to facilitate ease of adoption. Our study focuses on two-waytranslation between French and Mon\'egasque, a rare language unsupported byexisting translation tools due to limited corpus availability. Our resultsdemonstrate LYRA's effectiveness, frequently surpassing and consistentlymatching state-of-the-art encoder-decoder models in rare language translation.

Quick Read (beta)

loading the full paper ...