Abstract
Diffusion models have achieved great success in modeling continuous datamodalities such as images, audio, and video, but have seen limited use indiscrete domains such as language. Recent attempts to adapt diffusion tolanguage have presented diffusion as an alternative to existing pretrainedlanguage models. We view diffusion and existing language models ascomplementary. We demonstrate that encoder-decoder language models can beutilized to efficiently learn high-quality language autoencoders. We thendemonstrate that continuous diffusion models can be learned in the latent spaceof the language autoencoder, enabling us to sample continuous latentrepresentations that can be decoded into natural language with the pretraineddecoder. We validate the effectiveness of our approach for unconditional,class-conditional, and sequence-to-sequence language generation. We demonstrateacross multiple diverse data sets that our latent language diffusion models aresignificantly more effective than previous diffusion language models.