Abstract
Variational autoencoders (VAEs) are one of the powerful likelihood-basedgenerative models with applications in various domains. However, they struggleto generate high-quality images, especially when samples are obtained from theprior without any tempering. One explanation for VAEs' poor generative qualityis the prior hole problem: the prior distribution fails to match the aggregateapproximate posterior. Due to this mismatch, there exist areas in the latentspace with high density under the prior that do not correspond to any encodedimage. Samples from those areas are decoded to corrupted images. To tackle thisissue, we propose an energy-based prior defined by the product of a base priordistribution and a reweighting factor, designed to bring the base closer to theaggregate posterior. We train the reweighting factor by noise contrastiveestimation, and we generalize it to hierarchical VAEs with many latent variablegroups. Our experiments confirm that the proposed noise contrastive priorsimprove the generative performance of state-of-the-art VAEs by a large marginon the MNIST, CIFAR-10, CelebA 64, and CelebA HQ 256 datasets.