How Multilingual Are Large Language Models Fine-Tuned for Translation?

Abstract

A new paradigm for machine translation has recently emerged: fine-tuninglarge language models (LLM) on parallel text has been shown to outperformdedicated translation systems trained in a supervised fashion on much largeramounts of parallel data (Xu et al., 2024a; Alves et al., 2024). However, itremains unclear whether this paradigm can enable massively multilingual machinetranslation or whether it requires fine-tuning dedicated models for a smallnumber of language pairs. How does translation fine-tuning impact the MTcapabilities of LLMs for zero-shot languages, zero-shot language pairs, andtranslation tasks that do not involve English? To address these questions, weconduct an extensive empirical evaluation of the translation quality of theTOWER family of language models (Alves et al., 2024) on 132 translation tasksfrom the multi-parallel FLORES-200 data. We find that translation fine-tuningimproves translation quality even for zero-shot languages on average, but thatthe impact is uneven depending on the language pairs involved. These resultscall for further research to effectively enable massively multilingualtranslation with LLMs.

Quick Read (beta)

loading the full paper ...