Phylogeny-Inspired Adaptation of Multilingual Models to New Languages

Abstract

Large pretrained multilingual models, trained on dozens of languages, havedelivered promising results due to cross-lingual learning capabilities onvariety of language tasks. Further adapting these models to specific languages,especially ones unseen during pre-training, is an important goal towardsexpanding the coverage of language technologies. In this study, we show how wecan use language phylogenetic information to improve cross-lingual transferleveraging closely related languages in a structured, linguistically-informedmanner. We perform adapter-based training on languages from diverse languagefamilies (Germanic, Uralic, Tupian, Uto-Aztecan) and evaluate on both syntacticand semantic tasks, obtaining more than 20% relative performance improvementsover strong commonly used baselines, especially on languages unseen duringpre-training.

Quick Read (beta)

loading the full paper ...