BabelBERT: Massively Multilingual Transformers Meet a Massively Multilingual Lexical Resource

Abstract

While pretrained language models (PLMs) primarily serve as general purposetext encoders that can be fine-tuned for a wide variety of downstream tasks,recent work has shown that they can also be rewired to produce high-qualityword representations (i.e., static word embeddings) and yield good performancein type-level lexical tasks. While existing work primarily focused on lexicalspecialization of PLMs in monolingual and bilingual settings, in this work weexpose massively multilingual transformers (MMTs, e.g., mBERT or XLM-R) tomultilingual lexical knowledge at scale, leveraging BabelNet as the readilyavailable rich source of multilingual and cross-lingual type-level lexicalknowledge. Concretely, we leverage BabelNet's multilingual synsets to createsynonym pairs across $50$ languages and then subject the MMTs (mBERT and XLM-R)to a lexical specialization procedure guided by a contrastive objective. Weshow that such massively multilingual lexical specialization brings massivegains in two standard cross-lingual lexical tasks, bilingual lexicon inductionand cross-lingual word similarity, as well as in cross-lingual sentenceretrieval. Crucially, we observe gains for languages unseen in specialization,indicating that the multilingual lexical specialization enables generalizationto languages with no lexical constraints. In a series of subsequent controlledexperiments, we demonstrate that the pretraining quality of wordrepresentations in the MMT for languages involved in specialization has a muchlarger effect on performance than the linguistic diversity of the set ofconstraints. Encouragingly, this suggests that lexical tasks involvinglow-resource languages benefit the most from lexical knowledge of resource-richlanguages, generally much more available.

Quick Read (beta)

loading the full paper ...