Revisiting Topic-Guided Language Models

Abstract

A recent line of work in natural language processing has aimed to combinelanguage models and topic models. These topic-guided language models augmentneural language models with topic models, unsupervised learning methods thatcan discover document-level patterns of word use. This paper compares theeffectiveness of these methods in a standardized setting. We study fourtopic-guided language models and two baselines, evaluating the held-outpredictive performance of each model on four corpora. Surprisingly, we findthat none of these methods outperform a standard LSTM language model baseline,and most fail to learn good topics. Further, we train a probe of the neurallanguage model that shows that the baseline's hidden states already encodetopic information. We make public all code used for this study.

Quick Read (beta)

loading the full paper ...