Popular generative model learning methods such as Generative AdversarialNetworks (GANs), and Variational Autoencoders (VAE) enforce the latentrepresentation to follow simple distributions such as isotropic Gaussian. Inthis paper, we argue that learning a complicated distribution over the latentspace of an auto-encoder enables more accurate modeling of complicated datadistributions. Based on this observation, we propose a two stage optimizationprocedure which maximizes an approximate implicit density model. Weexperimentally verify that our method outperforms GANs and VAEs on two imagedatasets (MNIST, CELEB-A). We also show that our approach is amenable tolearning generative model for sequential data, by learning to generate speechand music.