Improving Pre-Trained Multilingual Models with Vocabulary Expansion

Abstract

Recently, pre-trained language models have achieved remarkable success in abroad range of natural language processing tasks. However, in multilingualsetting, it is extremely resource-consuming to pre-train a deep language modelover large-scale corpora for each language. Instead of exhaustivelypre-training monolingual language models independently, an alternative solutionis to pre-train a powerful multilingual deep language model over large-scalecorpora in hundreds of languages. However, the vocabulary size for eachlanguage in such a model is relatively small, especially for low-resourcelanguages. This limitation inevitably hinders the performance of thesemultilingual models on tasks such as sequence labeling, wherein in-depthtoken-level or sentence-level understanding is essential. In this paper, inspired by previous methods designed for monolingualsettings, we investigate two approaches (i.e., joint mapping and mixturemapping) based on a pre-trained multilingual model BERT for addressing theout-of-vocabulary (OOV) problem on a variety of tasks, including part-of-speechtagging, named entity recognition, machine translation quality estimation, andmachine reading comprehension. Experimental results show that using mixturemapping is more promising. To the best of our knowledge, this is the first workthat attempts to address and discuss the OOV issue in multilingual settings.

Quick Read (beta)

loading the full paper ...