Continuous Semantic Topic Embedding Model Using Variational Autoencoder

Abstract

This paper proposes the continuous semantic topic embedding model (CSTEM)which finds latent topic variables in documents using continuous semanticdistance function between the topics and the words by means of the variationalautoencoder(VAE). The semantic distance could be represented by any symmetricbell-shaped geometric distance function on the Euclidean space, for which theMahalanobis distance is used in this paper. In order for the semantic distanceto perform more properly, we newly introduce an additional model parameter foreach word to take out the global factor from this distance indicating howlikely it occurs regardless of its topic. It certainly improves the problemthat the Gaussian distribution which is used in previous topic model withcontinuous word embedding could not explain the semantic relation correctly andhelps to obtain the higher topic coherence. Through the experiments with thedataset of 20 Newsgroup, NIPS papers and CNN/Dailymail corpus, the performanceof the recent state-of-the-art models is accomplished by our model as well asgenerating topic embedding vectors which makes possible to observe where thetopic vectors are embedded with the word vectors in the real Euclidean spaceand how the topics are related each other semantically.

Quick Read (beta)

loading the full paper ...