Janus-Pro-R1: Advancing Collaborative Visual Comprehension and Generation via Reinforcement Learning

  • 2025-10-21 17:02:48
  • Kaihang Pan, Yang Wu, Wendong Bu, Kai Shen, Juncheng Li, Yingting Wang, Yunfei Li, Siliang Tang, Jun Xiao, Fei Wu, Hang Zhao, Yueting Zhuang
  • 0

Abstract

Recent endeavors in Multimodal Large Language Models (MLLMs) aim to unifyvisual comprehension and generation. However, these two capabilities remainlargely independent, as if they are two separate functions encapsulated withinthe same model. Consequently, visual comprehension does not enhance visualgeneration, and the reasoning mechanisms of LLMs have not been fully integratedto revolutionize image generation. In this paper, we propose to enable thecollaborative co-evolution of visual comprehension and generation, advancingimage generation into an iterative introspective process. We introduce atwo-stage training approach: supervised fine-tuning teaches the MLLM with thefoundational ability to generate genuine CoT for visual generation, whilereinforcement learning activates its full potential via anexploration-exploitation trade-off. Ultimately, we unlock the Aha moment invisual generation, advancing MLLMs from text-to-image tasks to unified imagegeneration. Extensive experiments demonstrate that our model not only excels intext-to-image generation and image editing, but also functions as a superiorimage semantic evaluator with enhanced visual comprehension capabilities.Project Page: https://janus-pro-r1.github.io.

 

Quick Read (beta)

loading the full paper ...