Step1X-Edit: A Practical Framework for General Image Editing

  • 2025-04-24 18:25:12
  • Shiyu Liu, Yucheng Han, Peng Xing, Fukun Yin, Rui Wang, Wei Cheng, Jiaqi Liao, Yingming Wang, Honghao Fu, Chunrui Han, Guopeng Li, Yuang Peng, Quan Sun, Jingwei Wu, Yan Cai, Zheng Ge, Ranchen Ming, Lei Xia, Xianfang Zeng, Yibo Zhu, Binxing Jiao, Xiangyu Zhang, Gang Yu, Daxin Jiang
  • 0

Abstract

In recent years, image editing models have witnessed remarkable and rapiddevelopment. The recent unveiling of cutting-edge multimodal models such asGPT-4o and Gemini2 Flash has introduced highly promising image editingcapabilities. These models demonstrate an impressive aptitude for fulfilling avast majority of user-driven editing requirements, marking a significantadvancement in the field of image manipulation. However, there is still a largegap between the open-source algorithm with these closed-source models. Thus, inthis paper, we aim to release a state-of-the-art image editing model, calledStep1X-Edit, which can provide comparable performance against the closed-sourcemodels like GPT-4o and Gemini2 Flash. More specifically, we adopt theMultimodal LLM to process the reference image and the user's editinginstruction. A latent embedding has been extracted and integrated with adiffusion image decoder to obtain the target image. To train the model, webuild a data generation pipeline to produce a high-quality dataset. Forevaluation, we develop the GEdit-Bench, a novel benchmark rooted in real-worlduser instructions. Experimental results on GEdit-Bench demonstrate thatStep1X-Edit outperforms existing open-source baselines by a substantial marginand approaches the performance of leading proprietary models, thereby makingsignificant contributions to the field of image editing.

 

Quick Read (beta)

loading the full paper ...