DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort

Abstract

We introduce DatasetGAN: an automatic procedure to generate massive datasetsof high-quality semantically segmented images requiring minimal human effort.Current deep networks are extremely data-hungry, benefiting from training onlarge-scale datasets, which are time consuming to annotate. Our method relieson the power of recent GANs to generate realistic images. We show how the GANlatent code can be decoded to produce a semantic segmentation of the image.Training the decoder only needs a few labeled examples to generalize to therest of the latent space, resulting in an infinite annotated dataset generator!These generated datasets can then be used for training any computer visionarchitecture just as real datasets are. As only a few images need to bemanually segmented, it becomes possible to annotate images in extreme detailand generate datasets with rich object and part segmentations. To showcase thepower of our approach, we generated datasets for 7 image segmentation taskswhich include pixel-level labels for 34 human face parts, and 32 car parts. Ourapproach outperforms all semi-supervised baselines significantly and is on parwith fully supervised methods, which in some cases require as much as 100x moreannotated data as our method.

Quick Read (beta)

loading the full paper ...