Abstract
Background consistency remains a significant challenge in image editingtasks. Despite extensive developments, existing works still face a trade-offbetween maintaining similarity to the original image and generating contentthat aligns with the target. Here, we propose KV-Edit, a training-free approachthat uses KV cache in DiTs to maintain background consistency, where backgroundtokens are preserved rather than regenerated, eliminating the need for complexmechanisms or expensive training, ultimately generating new content thatseamlessly integrates with the background within user-provided regions. Wefurther explore the memory consumption of the KV cache during editing andoptimize the space complexity to $O(1)$ using an inversion-free method. Ourapproach is compatible with any DiT-based generative model without additionaltraining. Experiments demonstrate that KV-Edit significantly outperformsexisting approaches in terms of both background and image quality, evensurpassing training-based methods. Project webpage is available athttps://xilluill.github.io/projectpages/KV-Edit