Machine Translation Advancements of Low-Resource Indian Languages by Transfer Learning

Abstract

This paper introduces the submission by Huawei Translation Center (HW-TSC) tothe WMT24 Indian Languages Machine Translation (MT) Shared Task. To develop areliable machine translation system for low-resource Indian languages, weemployed two distinct knowledge transfer strategies, taking into account thecharacteristics of the language scripts and the support available from existingopen-source models for Indian languages. For Assamese(as) and Manipuri(mn), wefine-tuned the existing IndicTrans2 open-source model to enable bidirectionaltranslation between English and these languages. For Khasi (kh) and Mizo (mz),We trained a multilingual model as a baseline using bilingual data from thesefour language pairs, along with an additional about 8kw English-Bengalibilingual data, all of which share certain linguistic features. This wasfollowed by fine-tuning to achieve bidirectional translation between Englishand Khasi, as well as English and Mizo. Our transfer learning experimentsproduced impressive results: 23.5 BLEU for en-as, 31.8 BLEU for en-mn, 36.2BLEU for as-en, and 47.9 BLEU for mn-en on their respective test sets.Similarly, the multilingual model transfer learning experiments yieldedimpressive outcomes, achieving 19.7 BLEU for en-kh, 32.8 BLEU for en-mz, 16.1BLEU for kh-en, and 33.9 BLEU for mz-en on their respective test sets. Theseresults not only highlight the effectiveness of transfer learning techniquesfor low-resource languages but also contribute to advancing machine translationcapabilities for low-resource Indian languages.

Quick Read (beta)

loading the full paper ...