Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models

  • 2024-10-08 12:44:49
  • Terra Blevins, Tomasz Limisiewicz, Suchin Gururangan, Margaret Li, Hila Gonen, Noah A. Smith, Luke Zettlemoyer
  • 0

Abstract

Despite their popularity in non-English NLP, multilingual language modelsoften underperform monolingual ones due to inter-language competition for modelparameters. We propose Cross-lingual Expert Language Models (X-ELM), whichmitigate this competition by independently training language models on subsetsof the multilingual corpus. This process specializes X-ELMs to differentlanguages while remaining effective as a multilingual ensemble. Our experimentsshow that when given the same compute budget, X-ELM outperforms jointly trainedmultilingual models across all considered languages and that these gainstransfer to downstream tasks. X-ELM provides additional benefits overperformance improvements: new experts can be iteratively added, adapting X-ELMto new languages without catastrophic forgetting. Furthermore, training isasynchronous, reducing the hardware requirements for multilingual training anddemocratizing multilingual modeling.

 

Quick Read (beta)

loading the full paper ...