Abstract
Recent advancements in text-guided image editing have achieved notablesuccess by leveraging natural language prompts for fine-grained semanticcontrol. However, certain editing semantics are challenging to specifyprecisely using textual descriptions alone. A practical alternative involveslearning editing semantics from paired source-target examples. Existingexemplar-based editing methods still rely on text prompts describing the changewithin paired examples or learning implicit text-based editing instructions. Inthis paper, we introduce PairEdit, a novel visual editing method designed toeffectively learn complex editing semantics from a limited number of imagepairs or even a single image pair, without using any textual guidance. Wepropose a target noise prediction that explicitly models semantic variationswithin paired images through a guidance direction term. Moreover, we introducea content-preserving noise schedule to facilitate more effective semanticlearning. We also propose optimizing distinct LoRAs to disentangle the learningof semantic variations from content. Extensive qualitative and quantitativeevaluations demonstrate that PairEdit successfully learns intricate semanticswhile significantly improving content consistency compared to baseline methods.Code will be available at https://github.com/xudonmao/PairEdit.