Topic Modeling in Embedding Spaces

  • 2019-07-08 03:50:57
  • Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei
  • 42

Abstract

Topic modeling analyzes documents to learn meaningful patterns of words.However, existing topic models fail to learn interpretable topics when workingwith large and heavy-tailed vocabularies. To this end, we develop the EmbeddedTopic Model (ETM), a generative model of documents that marries traditionaltopic models with word embeddings. In particular, it models each word with acategorical distribution whose natural parameter is the inner product between aword embedding and an embedding of its assigned topic. To fit the ETM, wedevelop an efficient amortized variational inference algorithm. The ETMdiscovers interpretable topics even with large vocabularies that include rarewords and stop words. It outperforms existing document models, such as latentDirichlet allocation (LDA), in terms of both topic quality and predictiveperformance.

 

Quick Read (beta)

loading the full paper ...