Parameter Sharing Methods for Multilingual Self-Attentional Translation Models

Abstract

In multilingual neural machine translation, it has been shown that sharing asingle translation model between multiple languages can achieve competitiveperformance, sometimes even leading to performance gains over bilinguallytrained models. However, these improvements are not uniform; often multilingualparameter sharing results in a decrease in accuracy due to translation modelsnot being able to accommodate different languages in their limited parameterspace. In this work, we examine parameter sharing techniques that strike ahappy medium between full sharing and individual training, specificallyfocusing on the self-attentional Transformer model. We find that the fullparameter sharing approach leads to increases in BLEU scores mainly when thetarget languages are from a similar language family. However, even in the casewhere target languages are from different families where full parameter sharingleads to a noticeable drop in BLEU scores, our proposed methods for partialsharing of parameters can lead to substantial improvements in translationaccuracy.

Quick Read (beta)

loading the full paper ...