Automatically Discovering and Learning New Visual Categories with Ranking Statistics

Abstract

We tackle the problem of discovering novel classes in an image collectiongiven labelled examples of other classes. This setting is similar tosemi-supervised learning, but significantly harder because there are nolabelled examples for the new classes. The challenge, then, is to leverage theinformation contained in the labelled images in order to learn ageneral-purpose clustering model and use the latter to identify the new classesin the unlabelled data. In this work we address this problem by combining threeideas: (1) we suggest that the common approach of bootstrapping an imagerepresentation using the labeled data only introduces an unwanted bias, andthat this can be avoided by using self-supervised learning to train therepresentation from scratch on the union of labelled and unlabelled data; (2)we use rank statistics to transfer the model's knowledge of the labelledclasses to the problem of clustering the unlabelled images; and, (3) we trainthe data representation by optimizing a joint objective function on thelabelled and unlabelled subsets of the data, improving both the supervisedclassification of the labelled data, and the clustering of the unlabelled data.We evaluate our approach on standard classification benchmarks and outperformcurrent methods for novel category discovery by a significant margin.

Quick Read (beta)

loading the full paper ...