Abstract
Conventional wisdom holds that autoregressive models for image generation aretypically accompanied by vector-quantized tokens. We observe that while adiscrete-valued space can facilitate representing a categorical distribution,it is not a necessity for autoregressive modeling. In this work, we propose tomodel the per-token probability distribution using a diffusion procedure, whichallows us to apply autoregressive models in a continuous-valued space. Ratherthan using categorical cross-entropy loss, we define a Diffusion Loss functionto model the per-token probability. This approach eliminates the need fordiscrete-valued tokenizers. We evaluate its effectiveness across a wide rangeof cases, including standard autoregressive models and generalized maskedautoregressive (MAR) variants. By removing vector quantization, our imagegenerator achieves strong results while enjoying the speed advantage ofsequence modeling. We hope this work will motivate the use of autoregressivegeneration in other continuous-valued domains and applications.