Abstract
Style transfer involves transferring the style from a reference image to thecontent of a target image. Recent advancements in LoRA-based (Low-RankAdaptation) methods have shown promise in effectively capturing the style of asingle image. However, these approaches still face significant challenges suchas content inconsistency, style misalignment, and content leakage. In thispaper, we comprehensively analyze the limitations of the standard diffusionparameterization, which learns to predict noise, in the context of styletransfer. To address these issues, we introduce ConsisLoRA, a LoRA-based methodthat enhances both content and style consistency by optimizing the LoRA weightsto predict the original image rather than noise. We also propose a two-steptraining strategy that decouples the learning of content and style from thereference image. To effectively capture both the global structure and localdetails of the content image, we introduce a stepwise loss transition strategy.Additionally, we present an inference guidance method that enables continuouscontrol over content and style strengths during inference. Through bothqualitative and quantitative evaluations, our method demonstrates significantimprovements in content and style consistency while effectively reducingcontent leakage.