StyleDrop: Text-to-Image Generation in Any Style

Abstract

Pre-trained large text-to-image models synthesize impressive images with anappropriate use of text prompts. However, ambiguities inherent in naturallanguage and out-of-distribution effects make it hard to synthesize imagestyles, that leverage a specific design pattern, texture or material. In thispaper, we introduce StyleDrop, a method that enables the synthesis of imagesthat faithfully follow a specific style using a text-to-image model. Theproposed method is extremely versatile and captures nuances and details of auser-provided style, such as color schemes, shading, design patterns, and localand global effects. It efficiently learns a new style by fine-tuning very fewtrainable parameters (less than $1\%$ of total model parameters) and improvingthe quality via iterative training with either human or automated feedback.Better yet, StyleDrop is able to deliver impressive results even when the usersupplies only a single image that specifies the desired style. An extensivestudy shows that, for the task of style tuning text-to-image models, StyleDropimplemented on Muse convincingly outperforms other methods, includingDreamBooth and textual inversion on Imagen or Stable Diffusion. More resultsare available at our project website: https://styledrop.github.io

Quick Read (beta)

loading the full paper ...