StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators

Abstract

Can a generative model be trained to produce images from a specific domain,guided by a text prompt only, without seeing any image? In other words: can animage generator be trained blindly? Leveraging the semantic power of largescale Contrastive-Language-Image-Pre-training (CLIP) models, we present atext-driven method that allows shifting a generative model to new domains,without having to collect even a single image from those domains. We show thatthrough natural language prompts and a few minutes of training, our method canadapt a generator across a multitude of domains characterized by diverse stylesand shapes. Notably, many of these modifications would be difficult or outrightimpossible to reach with existing methods. We conduct an extensive set ofexperiments and comparisons across a wide range of domains. These demonstratethe effectiveness of our approach and show that our shifted models maintain thelatent-space properties that make generative models appealing for downstreamtasks.

Quick Read (beta)

loading the full paper ...