Massively Multilingual Text Translation For Low-Resource Languages

  • 2024-01-29 21:33:08
  • Zhong Zhou
  • 0

Abstract

Translation into severely low-resource languages has both the cultural goalof saving and reviving those languages and the humanitarian goal of assistingthe everyday needs of local communities that are accelerated by the recentCOVID-19 pandemic. In many humanitarian efforts, translation into severelylow-resource languages often does not require a universal translation engine,but a dedicated text-specific translation engine. For example, healthcarerecords, hygienic procedures, government communication, emergency proceduresand religious texts are all limited texts. While generic translation enginesfor all languages do not exist, translation of multilingually known limitedtexts into new, low-resource languages may be possible and reduce humantranslation effort. We attempt to leverage translation resources fromrich-resource languages to efficiently produce best possible translationquality for well known texts, which are available in multiple languages, in anew, low-resource language. To reach this goal, we argue that in translating aclosed text into low-resource languages, generalization to out-of-domain textsis not necessary, but generalization to new languages is. Performance gaincomes from massive source parallelism by careful choice of close-by languagefamilies, style-consistent corpus-level paraphrases within the same languageand strategic adaptation of existing large pretrained multilingual models tothe domain first and then to the language. Such performance gain makes itpossible for machine translation systems to collaborate with human translatorsto expedite the translation process into new, low-resource languages.

 

Quick Read (beta)

loading the full paper ...