Abstract
A recent line of work in natural language processing has aimed to combinelanguage models and topic models. These topic-guided language models augmentneural language models with topic models, unsupervised learning methods thatcan discover document-level patterns of word use. This paper compares theeffectiveness of these methods in a standardized setting. We study fourtopic-guided language models and two baselines, evaluating the held-outpredictive performance of each model on four corpora. Surprisingly, we findthat none of these methods outperform a standard LSTM language model baseline,and most fail to learn good topics. Further, we train a probe of the neurallanguage model that shows that the baseline's hidden states already encodetopic information. We make public all code used for this study.