Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data

  • 2021-05-31 16:01:18
  • Wei-Jen Ko, Ahmed El-Kishky, Adithya Renduchintala, Vishrav Chaudhary, Naman Goyal, Francisco Guzm├ín, Pascale Fung, Philipp Koehn, Mona Diab
  • 0

Abstract

The scarcity of parallel data is a major obstacle for training high-qualitymachine translation systems for low-resource languages. Fortunately, somelow-resource languages are linguistically related or similar to high-resourcelanguages; these related languages may share many lexical or syntacticstructures. In this work, we exploit this linguistic overlap to facilitatetranslating to and from a low-resource language with only monolingual data, inaddition to any parallel data in the related high-resource language. Ourmethod, NMT-Adapt, combines denoising autoencoding, back-translation andadversarial objectives to utilize monolingual data for low-resource adaptation.We experiment on 7 languages from three different language families and showthat our technique significantly improves translation into low-resourcelanguage compared to other translation baselines.

 

Quick Read (beta)

loading the full paper ...