Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

Abstract

Autoregressive visual generation models typically rely on tokenizers tocompress images into tokens that can be predicted sequentially. A fundamentaldilemma exists in token representation: discrete tokens enable straightforwardmodeling with standard cross-entropy loss, but suffer from information loss andtokenizer training instability; continuous tokens better preserve visualdetails, but require complex distribution modeling, complicating the generationpipeline. In this paper, we propose TokenBridge, which bridges this gap bymaintaining the strong representation capacity of continuous tokens whilepreserving the modeling simplicity of discrete tokens. To achieve this, wedecouple discretization from the tokenizer training process throughpost-training quantization that directly obtains discrete tokens fromcontinuous representations. Specifically, we introduce a dimension-wisequantization strategy that independently discretizes each feature dimension,paired with a lightweight autoregressive prediction mechanism that efficientlymodel the resulting large token space. Extensive experiments show that ourapproach achieves reconstruction and generation quality on par with continuousmethods while using standard categorical prediction. This work demonstratesthat bridging discrete and continuous paradigms can effectively harness thestrengths of both approaches, providing a promising direction for high-qualityvisual generation with simple autoregressive modeling. Project page:https://yuqingwang1029.github.io/TokenBridge.

Quick Read (beta)

loading the full paper ...