Universal Neural Machine Translation for Extremely Low Resource Languages

  • 2018-02-15 00:35:08
  • Jiatao Gu, Hany Hassan, Jacob Devlin, Victor O. K. Li
  • 23

Abstract

In this paper, we propose a new universal machine translation approachfocusing on languages with a limited amount of parallel data. Our proposedapproach utilizes a transfer-learning approach to share lexical and sentenceslevel representations across multiple source languages into one targetlanguage. The lexical part is shared through a Universal Lexical Representationto support multi-lingual word-level sharing. The sentence-level sharing isrepresented by a model of experts from all source languages that share thesource encoders with all other languages. This enables the low-resourcelanguage to utilize the lexical and sentence representations of the higherresource languages. Our approach is able to achieve 23 BLEU on Romanian-EnglishWMT2016 using a tiny parallel corpus of 6k sentences, compared to the 18 BLEUof strong baseline system which uses multi-lingual training andback-translation.

 

Quick Read (beta)

loading the full paper ...