ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL

  • 2025-05-30 18:59:48
  • Yu Zhang, Yunqi Li, Yifan Yang, Rui Wang, Yuqing Yang, Dai Qi, Jianmin Bao, Dongdong Chen, Chong Luo, Lili Qiu
  • 0

Abstract

Although chain-of-thought reasoning and reinforcement learning (RL) havedriven breakthroughs in NLP, their integration into generative vision modelsremains underexplored. We introduce ReasonGen-R1, a two-stage framework thatfirst imbues an autoregressive image generator with explicit text-based"thinking" skills via supervised fine-tuning on a newly generated reasoningdataset of written rationales, and then refines its outputs using GroupRelative Policy Optimization. To enable the model to reason through text beforegenerating images, We automatically generate and release a corpus of modelcrafted rationales paired with visual prompts, enabling controlled planning ofobject layouts, styles, and scene compositions. Our GRPO algorithm uses rewardsignals from a pretrained vision language model to assess overall visualquality, optimizing the policy in each update. Evaluations on GenEval, DPG, andthe T2I benchmark demonstrate that ReasonGen-R1 consistently outperforms strongbaselines and prior state-of-the-art models. More: aka.ms/reasongen.

 

Quick Read (beta)

loading the full paper ...