Abstract
Multilingual neural machine translation (MNMT) learns to translate multiplelanguage pairs with a single model, potentially improving both the accuracy andthe memory-efficiency of deployed models. However, the heavy data imbalancebetween languages hinders the model from performing uniformly across languagepairs. In this paper, we propose a new learning objective for MNMT based ondistributionally robust optimization, which minimizes the worst-case expectedloss over the set of language pairs. We further show how to practicallyoptimize this objective for large translation corpora using an iterated bestresponse scheme, which is both effective and incurs negligible additionalcomputational cost compared to standard empirical risk minimization. We performextensive experiments on three sets of languages from two datasets and showthat our method consistently outperforms strong baseline methods in terms ofaverage and per-language performance under both many-to-one and one-to-manytranslation settings.