Unsupervised Part Discovery from Contrastive Reconstruction

Abstract

The goal of self-supervised visual representation learning is to learnstrong, transferable image representations, with the majority of researchfocusing on object or scene level. On the other hand, representation learningat part level has received significantly less attention. In this paper, wepropose an unsupervised approach to object part discovery and segmentation andmake three contributions. First, we construct a proxy task through a set ofobjectives that encourages the model to learn a meaningful decomposition of theimage into its parts. Secondly, prior work argues for reconstructing orclustering pre-computed features as a proxy to parts; we show empirically thatthis alone is unlikely to find meaningful parts; mainly because of their lowresolution and the tendency of classification networks to spatially smear outinformation. We suggest that image reconstruction at the level of pixels canalleviate this problem, acting as a complementary cue. Lastly, we show that thestandard evaluation based on keypoint regression does not correlate well withsegmentation quality and thus introduce different metrics, NMI and ARI, thatbetter characterize the decomposition of objects into parts. Our method yieldssemantic parts which are consistent across fine-grained but visually distinctcategories, outperforming the state of the art on three benchmark datasets.Code is available at the project page:https://www.robots.ox.ac.uk/~vgg/research/unsup-parts/.

Quick Read (beta)

loading the full paper ...