Grounded Compositional Outputs for Adaptive Language Modeling

Abstract

Language models have emerged as a central component across NLP, and a greatdeal of progress depends on the ability to cheaply adapt them (e.g., throughfinetuning) to new domains and tasks. A language model's\emph{vocabulary}---typically selected before training and permanently fixedlater---affects its size and is part of what makes it resistant to suchadaptation. Prior work has used compositional input embeddings based on surfaceforms to ameliorate this issue. In this work, we go one step beyond and proposea fully compositional output embedding layer for language models, which isfurther grounded in information from a structured lexicon (WordNet), namelysemantically related words and free-text definitions. To our knowledge, theresult is the first word-level language model with a size that does not dependon the training vocabulary. We evaluate the model on conventional languagemodeling as well as challenging cross-domain settings with an open vocabulary,finding that it matches or outperforms previous state-of-the-art outputembedding methods and adaptation approaches. Our analysis attributes theimprovements to sample efficiency: our model is more accurate for low-frequencywords.

Quick Read (beta)

loading the full paper ...