STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis

  • 2025-06-06 18:58:39
  • Jiatao Gu, Tianrong Chen, David Berthelot, Huangjie Zheng, Yuyang Wang, Ruixiang Zhang, Laurent Dinh, Miguel Angel Bautista, Josh Susskind, Shuangfei Zhai
  • 0

Abstract

We present STARFlow, a scalable generative model based on normalizing flowsthat achieves strong performance in high-resolution image synthesis. The coreof STARFlow is Transformer Autoregressive Flow (TARFlow), which combines theexpressive power of normalizing flows with the structured modeling capabilitiesof Autoregressive Transformers. We first establish the theoretical universalityof TARFlow for modeling continuous distributions. Building on this foundation,we introduce several key architectural and algorithmic innovations tosignificantly enhance scalability: (1) a deep-shallow design, wherein a deepTransformer block captures most of the model representational capacity,complemented by a few shallow Transformer blocks that are computationallyefficient yet substantially beneficial; (2) modeling in the latent space ofpretrained autoencoders, which proves more effective than direct pixel-levelmodeling; and (3) a novel guidance algorithm that significantly boosts samplequality. Crucially, our model remains an end-to-end normalizing flow, enablingexact maximum likelihood training in continuous spaces without discretization.STARFlow achieves competitive performance in both class-conditional andtext-conditional image generation tasks, approaching state-of-the-art diffusionmodels in sample quality. To our knowledge, this work is the first successfuldemonstration of normalizing flows operating effectively at this scale andresolution.

 

Quick Read (beta)

loading the full paper ...