Machine Translation into Low-resource Language Varieties

  • 2021-10-15 18:35:49
  • Sachin Kumar, Antonios Anastasopoulos, Shuly Wintner, Yulia Tsvetkov
  • 0

Abstract

State-of-the-art machine translation (MT) systems are typically trained togenerate the "standard" target language; however, many languages have multiplevarieties (regional varieties, dialects, sociolects, non-native varieties) thatare different from the standard language. Such varieties are oftenlow-resource, and hence do not benefit from contemporary NLP solutions, MTincluded. We propose a general framework to rapidly adapt MT systems togenerate language varieties that are close to, but different from, the standardtarget language, using no parallel (source--variety) data. This also includesadaptation of MT systems to low-resource typologically-related targetlanguages. We experiment with adapting an English--Russian MT system togenerate Ukrainian and Belarusian, an English--Norwegian Bokm{\aa}l system togenerate Nynorsk, and an English--Arabic system to generate four Arabicdialects, obtaining significant improvements over competitive baselines.

 

Quick Read (beta)

loading the full paper ...