SentenceMIM: A Latent Variable Language Model

Abstract

SentenceMIM is a probabilistic auto-encoder for language data, trained withMutual Information Machine (MIM) learning to provide a fixed lengthrepresentation of variable length language observations (i.e., similar to VAE).Previous attempts to learn VAEs for language data faced challenges due toposterior collapse. MIM learning encourages high mutual information betweenobservations and latent variables, and is robust against posterior collapse. Assuch, it learns informative representations whose dimension can be an order ofmagnitude higher than existing language VAEs. Importantly, the SentenceMIM losshas no hyper-parameters, simplifying optimization. We compare sentenceMIM withVAE, and AE on multiple datasets. SentenceMIM yields excellent reconstruction,comparable to AEs, with a rich structured latent space, comparable to VAEs. Thestructured latent representation is demonstrated with interpolation betweensentences of different lengths. We demonstrate the versatility of sentenceMIMby utilizing a trained model for question-answering and transfer learning,without fine-tuning, outperforming VAE and AE with similar architectures.

Quick Read (beta)

loading the full paper ...