Lifting the Curse of Multilinguality by Pre-training Modular Transformers

Abstract

Multilingual pre-trained models are known to suffer from the curse ofmultilinguality, which causes per-language performance to drop as they covermore languages. We address this issue by introducing language-specific modules,which allows us to grow the total capacity of the model, while keeping thetotal number of trainable parameters per language constant. In contrast withprior work that learns language-specific components post-hoc, we pre-train themodules of our Cross-lingual Modular (X-Mod) models from the start. Ourexperiments on natural language inference, named entity recognition andquestion answering show that our approach not only mitigates the negativeinterference between languages, but also enables positive transfer, resultingin improved monolingual and cross-lingual performance. Furthermore, ourapproach enables adding languages post-hoc with no measurable drop inperformance, no longer limiting the model usage to the set of pre-trainedlanguages.

Quick Read (beta)

loading the full paper ...