CoreEditor: Consistent 3D Editing via Correspondence-constrained Diffusion

Abstract

Text-driven 3D editing seeks to modify 3D scenes according to textualdescriptions, and most existing approaches tackle this by adapting pre-trained2D image editors to multi-view inputs. However, without explicit control overmulti-view information exchange, they often fail to maintain cross-viewconsistency, leading to insufficient edits and blurry details. We introduceCoreEditor, a novel framework for consistent text-to-3D editing. The keyinnovation is a correspondence-constrained attention mechanism that enforcesprecise interactions between pixels expected to remain consistent throughoutthe diffusion denoising process. Beyond relying solely on geometric alignment,we further incorporate semantic similarity estimated during denoising, enablingmore reliable correspondence modeling and robust multi-view editing. Inaddition, we design a selective editing pipeline that allows users to choosepreferred results from multiple candidates, offering greater flexibility anduser control. Extensive experiments show that CoreEditor produces high-quality,3D-consistent edits with sharper details, significantly outperforming priormethods.

Quick Read (beta)

loading the full paper ...