Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

Abstract

We introduce a method that allows to automatically segment images intosemantically meaningful regions without human supervision. Derived regions areconsistent across different images and coincide with human-defined semanticclasses on some datasets. In cases where semantic regions might be hard forhuman to define and consistently label, our method is still able to findmeaningful and consistent semantic classes. In our work, we use pretrainedStyleGAN2~\cite{karras2020analyzing} generative model: clustering in thefeature space of the generative model allows to discover semantic classes. Onceclasses are discovered, a synthetic dataset with generated images andcorresponding segmentation masks can be created. After that a segmentationmodel is trained on the synthetic dataset and is able to generalize to realimages. Additionally, by using CLIP~\cite{radford2021learning} we are able touse prompts defined in a natural language to discover some desired semanticclasses. We test our method on publicly available datasets and showstate-of-the-art results.

Quick Read (beta)

loading the full paper ...