Abstract
Diffusion-based generative models have revolutionized object-oriented imageediting, yet their deployment in realistic object removal and insertion remainshampered by challenges such as the intricate interplay of physical effects andinsufficient paired training data. In this work, we introduce OmniPaint, aunified framework that re-conceptualizes object removal and insertion asinterdependent processes rather than isolated tasks. Leveraging a pre-traineddiffusion prior along with a progressive training pipeline comprising initialpaired sample optimization and subsequent large-scale unpaired refinement viaCycleFlow, OmniPaint achieves precise foreground elimination and seamlessobject insertion while faithfully preserving scene geometry and intrinsicproperties. Furthermore, our novel CFD metric offers a robust, reference-freeevaluation of context consistency and object hallucination, establishing a newbenchmark for high-fidelity image editing. Project page:https://yeates.github.io/OmniPaint-Page/