Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information

Abstract

We investigate the following question for machine translation (MT): can wedevelop a single universal MT model to serve as the common seed and obtainderivative and improved models on arbitrary language pairs? We propose mRASP,an approach to pre-train a universal multilingual neural machine translationmodel. Our key idea in mRASP is its novel technique of random alignedsubstitution, which brings words and phrases with similar meanings acrossmultiple languages closer in the representation space. We pre-train a mRASPmodel on 32 language pairs jointly with only public datasets. The model is thenfine-tuned on downstream language pairs to obtain specialized MT models. Wecarry out extensive experiments on 42 translation directions across a diversesettings, including low, medium, rich resource, and as well as transferring toexotic language pairs. Experimental results demonstrate that mRASP achievessignificant performance improvement compared to directly training on thosetarget pairs. It is the first time to verify that multiple low-resourcelanguage pairs can be utilized to improve rich resource MT. Surprisingly, mRASPis even able to improve the translation quality on exotic languages that neveroccur in the pre-training corpus. Code, data, and pre-trained models areavailable at https://github.com/linzehui/mRASP.

Quick Read (beta)

loading the full paper ...