SpectralAR: Spectral Autoregressive Visual Generation

  • 2025-06-12 18:57:44
  • Yuanhui Huang, Weiliang Chen, Wenzhao Zheng, Yueqi Duan, Jie Zhou, Jiwen Lu
  • 0

Abstract

Autoregressive visual generation has garnered increasing attention due to itsscalability and compatibility with other modalities compared with diffusionmodels. Most existing methods construct visual sequences as spatial patches forautoregressive generation. However, image patches are inherently parallel,contradicting the causal nature of autoregressive modeling. To address this, wepropose a Spectral AutoRegressive (SpectralAR) visual generation framework,which realizes causality for visual sequences from the spectral perspective.Specifically, we first transform an image into ordered spectral tokens withNested Spectral Tokenization, representing lower to higher frequencycomponents. We then perform autoregressive generation in a coarse-to-finemanner with the sequences of spectral tokens. By considering different levelsof detail in images, our SpectralAR achieves both sequence causality and tokenefficiency without bells and whistles. We conduct extensive experiments onImageNet-1K for image reconstruction and autoregressive generation, andSpectralAR achieves 3.02 gFID with only 64 tokens and 310M parameters. Projectpage: https://huang-yh.github.io/spectralar/.

 

Quick Read (beta)

loading the full paper ...