Reverse Transfer Learning: Can Word Embeddings Trained for Different NLP Tasks Improve Neural Language Models?

Abstract

Natural language processing (NLP) tasks tend to suffer from a paucity ofsuitably annotated training data, hence the recent success of transfer learningacross a wide variety of them. The typical recipe involves: (i) training adeep, possibly bidirectional, neural network with an objective related tolanguage modeling, for which training data is plentiful; and (ii) using thetrained network to derive contextual representations that are far richer thanstandard linear word embeddings such as word2vec, and thus result in importantgains. In this work, we wonder whether the opposite perspective is also true:can contextual representations trained for different NLP tasks improve languagemodeling itself? Since language models (LMs) are predominantly locallyoptimized, other NLP tasks may help them make better predictions based on theentire semantic fabric of a document. We test the performance of several typesof pre-trained embeddings in neural LMs, and we investigate whether it ispossible to make the LM more aware of global semantic information throughembeddings pre-trained with a domain classification model. Initial experimentssuggest that as long as the proper objective criterion is used during training,pre-trained embeddings are likely to be beneficial for neural languagemodeling.

Quick Read (beta)

loading the full paper ...