Abstract
Contrastive learning for single object centric images has achieved remarkableprogress on unsupervised representation, but suffering inferior performance onthe widespread images with multiple objects. In this paper, we propose a simplebut effective method, Multiple Object Stitching (MOS), to refine theunsupervised representation for multi-object images. Specifically, we constructthe multi-object images by stitching the single object centric ones, where theobjects in the synthesized multi-object images are predetermined. Hence,compared to the existing contrastive methods, our method provides additionalobject correspondences between multi-object images without human annotations.In this manner, our method pays more attention to the representations of eachobject in multi-object image, thus providing more detailed representations forcomplicated downstream tasks, such as object detection and semanticsegmentation. Experimental results on ImageNet, CIFAR and COCO datasetsdemonstrate that our proposed method achieves the leading unsupervisedrepresentation performance on both single object centric images andmulti-object ones. The source code is available athttps://github.com/visresearch/MultipleObjectStitching.