Improving Multilingual Neural Machine Translation For Low-Resource Languages: French,English - Vietnamese

Abstract

Prior works have demonstrated that a low-resource language pair can benefitfrom multilingual machine translation (MT) systems, which rely on many languagepairs' joint training. This paper proposes two simple strategies to address therare word issue in multilingual MT systems for two low-resource language pairs:French-Vietnamese and English-Vietnamese. The first strategy is about dynamicallearning word similarity of tokens in the shared space among source languageswhile another one attempts to augment the translation ability of rare wordsthrough updating their embeddings during the training. Besides, we leveragemonolingual data for multilingual MT systems to increase the amount ofsynthetic parallel corpora while dealing with the data sparsity problem. Wehave shown significant improvements of up to +1.62 and +2.54 BLEU points overthe bilingual baseline systems for both language pairs and released ourdatasets for the research community.

Quick Read (beta)

loading the full paper ...