Multilingual Domain Adaptation for NMT: Decoupling Language and Domain Information with Adapters

Abstract

Adapter layers are lightweight, learnable units inserted between transformerlayers. Recent work explores using such layers for neural machine translation(NMT), to adapt pre-trained models to new domains or language pairs, trainingonly a small set of parameters for each new setting (language pair or domain).In this work we study the compositionality of language and domain adapters inthe context of Machine Translation. We aim to study, 1) parameter-efficientadaptation to multiple domains and languages simultaneously (full-resourcescenario) and 2) cross-lingual transfer in domains where parallel data isunavailable for certain language pairs (partial-resource scenario). We findthat in the partial resource scenario a naive combination of domain-specificand language-specific adapters often results in `catastrophic forgetting' ofthe missing languages. We study other ways to combine the adapters to alleviatethis issue and maximize cross-lingual transfer. With our best adaptercombinations, we obtain improvements of 3-4 BLEU on average for sourcelanguages that do not have in-domain data. For target languages withoutin-domain data, we achieve a similar improvement by combining adapters withback-translation. Supplementary material is available athttps://tinyurl.com/r66stbxj

Quick Read (beta)

loading the full paper ...