Object-Aware Cropping for Self-Supervised Learning

Abstract

A core component of the recent success of self-supervised learning iscropping data augmentation, which selects sub-regions of an image to be used aspositive views in the self-supervised loss. The underlying assumption is thatrandomly cropped and resized regions of a given image share information aboutthe objects of interest, which the learned representation will capture. Thisassumption is mostly satisfied in datasets such as ImageNet where there is alarge, centered object, which is highly likely to be present in random crops ofthe full image. However, in other datasets such as OpenImages or COCO, whichare more representative of real world uncurated data, there are typicallymultiple small objects in an image. In this work, we show that self-supervisedlearning based on the usual random cropping performs poorly on such datasets.We propose replacing one or both of the random crops with crops obtained froman object proposal algorithm. This encourages the model to learn both objectand scene level semantic representations. Using this approach, which we callobject-aware cropping, results in significant improvements over scene croppingon classification and object detection benchmarks. For example, on OpenImages,our approach achieves an improvement of 8.8% mAP over random scene-levelcropping using MoCo-v2 based pre-training. We also show significantimprovements on COCO and PASCAL-VOC object detection and segmentation tasksover the state-of-the-art self-supervised learning approaches. Our approach isefficient, simple and general, and can be used in most existing contrastive andnon-contrastive self-supervised learning frameworks.

Quick Read (beta)

loading the full paper ...