Improving Zero-shot Neural Machine Translation on Language-specific Encoders-Decoders

Abstract

Recently, universal neural machine translation (NMT) with sharedencoder-decoder gained good performance on zero-shot translation. Unlikeuniversal NMT, jointly trained language-specific encoders-decoders aim toachieve universal representation across non-shared modules, each of which isfor a language or language family. The non-shared architecture has theadvantage of mitigating internal language competition, especially when theshared vocabulary and model parameters are restricted in their size. However,the performance of using multiple encoders and decoders on zero-shottranslation still lags behind universal NMT. In this work, we study zero-shottranslation using language-specific encoders-decoders. We propose to generalizethe non-shared architecture and universal NMT by differentiating theTransformer layers between language-specific and interlingua. By selectivelysharing parameters and applying cross-attentions, we explore maximizing therepresentation universality and realizing the best alignment oflanguage-agnostic information. We also introduce a denoising auto-encoding(DAE) objective to jointly train the model with the translation task in amulti-task manner. Experiments on two public multilingual parallel datasetsshow that our proposed model achieves a competitive or better results thanuniversal NMT and strong pivot baseline. Moreover, we experiment incrementallyadding new language to the trained model by only updating the new modelparameters. With this little effort, the zero-shot translation between thisnewly added language and existing languages achieves a comparable result withthe model trained jointly from scratch on all languages.

Quick Read (beta)

loading the full paper ...