Paraphrase Generation with Latent Bag of Words

  • 2020-01-07 09:22:58
  • Yao Fu, Yansong Feng, John P. Cunningham
  • 19

Abstract

Paraphrase generation is a longstanding important problem in natural languageprocessing. In addition, recent progress in deep generative models has shown promisingresults on discrete latent variables for text generation. Inspired by variational autoencoders with discrete latent structures, in thiswork, we propose a latent bag of words (BOW) model for paraphrase generation. We ground the semantics of a discrete latent variable by the BOW from thetarget sentences. We use this latent variable to build a fully differentiable content planningand surface realization model. Specifically, we use source words to predict their neighbors and model thetarget BOW with a mixture of softmax. We use Gumbel top-k reparameterization to perform differentiable subsetsampling from the predicted BOW distribution. We retrieve the sampled word embeddings and use them to augment the decoderand guide its generation search space. Our latent BOW model not only enhances the decoder, but also exhibits clearinterpretability. We show the model interpretability with regard to \emph{(i)} unsupervisedlearning of word neighbors \emph{(ii)} the step-by-step generation procedure. Extensive experiments demonstrate the transparent and effective generationprocess of this model.\footnote{Our code can be found at\url{https://github.com/FranxYao/dgm_latent_bow}}

 

Quick Read (beta)

loading the full paper ...