Abstract
Diffusion models entangle content and style generation during the denoisingprocess, leading to undesired content modification when directly applied tostylization tasks. Existing methods struggle to effectively control thediffusion model to meet the aesthetic-level requirements for stylization. Inthis paper, we introduce \textbf{Artist}, a training-free approach thataesthetically controls the content and style generation of a pretraineddiffusion model for text-driven stylization. Our key insight is to disentanglethe denoising of content and style into separate diffusion processes whilesharing information between them. We propose simple yet effective content andstyle control methods that suppress style-irrelevant content generation,resulting in harmonious stylization results. Extensive experiments demonstratethat our method excels at achieving aesthetic-level stylization requirements,preserving intricate details in the content image and aligning well with thestyle prompt. Furthermore, we showcase the highly controllability of thestylization strength from various perspectives. Code will be released, projecthome page: https://DiffusionArtist.github.io