Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals

Abstract

Being able to learn dense semantic representations of images withoutsupervision is an important problem in computer vision. However, despite itssignificance, this problem remains rather unexplored, with a few exceptionsthat considered unsupervised semantic segmentation on small-scale datasets witha narrow visual domain. In this paper, we make a first attempt to tackle theproblem on datasets that have been traditionally utilized for the supervisedcase. To achieve this, we introduce a novel two-step framework that adopts apredetermined prior in a contrastive optimization objective to learn pixelembeddings. This marks a large deviation from existing works that relied onproxy tasks or end-to-end clustering. Additionally, we argue about theimportance of having a prior that contains information about objects, or theirparts, and discuss several possibilities to obtain such a prior in anunsupervised manner. Extensive experimental evaluation shows that the proposed method comes withkey advantages over existing works. First, the learned pixel embeddings can bedirectly clustered in semantic groups using K-Means. Second, the method canserve as an effective unsupervised pre-training for the semantic segmentationtask. In particular, when fine-tuning the learned representations using just 1%of labeled examples on PASCAL, we outperform supervised ImageNet pre-trainingby 7.1% mIoU. The code is available athttps://github.com/wvangansbeke/Unsupervised-Semantic-Segmentation.

Quick Read (beta)

loading the full paper ...