Dict-BERT: Enhancing Language Model Pre-training with Dictionary

Abstract

Pre-trained language models (PLMs) aim to learn universal languagerepresentations by conducting self-supervised training tasks on large-scalecorpora. Since PLMs capture word semantics in different contexts, the qualityof word representations highly depends on word frequency, which usually followsa heavy-tailed distributions in the pre-training corpus. Therefore, theembeddings of rare words on the tail are usually poorly optimized. In thiswork, we focus on enhancing language model pre-training by leveragingdefinitions of the rare words in dictionaries (e.g., Wiktionary). Toincorporate a rare word definition as a part of input, we fetch its definitionfrom the dictionary and append it to the end of the input text sequence. Inaddition to training with the masked language modeling objective, we proposetwo novel self-supervised pre-training tasks on word and sentence-levelalignment between input text sequence and rare word definitions to enhancelanguage modeling representation with dictionary. We evaluate the proposedDict-BERT model on the language understanding benchmark GLUE and eightspecialized domain benchmark datasets. Extensive experiments demonstrate thatDict-BERT can significantly improve the understanding of rare words and boostmodel performance on various NLP downstream tasks.

Quick Read (beta)

loading the full paper ...