Normalizing Flows are Capable Generative Models

Abstract

Normalizing Flows (NFs) are likelihood-based models for continuous inputs.They have demonstrated promising results on both density estimation andgenerative modeling tasks, but have received relatively little attention inrecent years. In this work, we demonstrate that NFs are more powerful thanpreviously believed. We present TarFlow: a simple and scalable architecturethat enables highly performant NF models. TarFlow can be thought of as aTransformer-based variant of Masked Autoregressive Flows (MAFs): it consists ofa stack of autoregressive Transformer blocks on image patches, alternating theautoregression direction between layers. TarFlow is straightforward to trainend-to-end, and capable of directly modeling and generating pixels. We alsopropose three key techniques to improve sample quality: Gaussian noiseaugmentation during training, a post training denoising procedure, and aneffective guidance method for both class-conditional and unconditionalsettings. Putting these together, TarFlow sets new state-of-the-art results onlikelihood estimation for images, beating the previous best methods by a largemargin, and generates samples with quality and diversity comparable todiffusion models, for the first time with a stand-alone NF model. We make ourcode available at https://github.com/apple/ml-tarflow.

Quick Read (beta)

loading the full paper ...