A Simple Framework for Contrastive Learning of Visual Representations

Abstract

This paper presents SimCLR: a simple framework for contrastive learning ofvisual representations. We simplify recently proposed contrastiveself-supervised learning algorithms without requiring specialized architecturesor a memory bank. In order to understand what enables the contrastiveprediction tasks to learn useful representations, we systematically study themajor components of our framework. We show that (1) composition of dataaugmentations plays a critical role in defining effective predictive tasks, (2)introducing a learnable nonlinear transformation between the representation andthe contrastive loss substantially improves the quality of the learnedrepresentations, and (3) contrastive learning benefits from larger batch sizesand more training steps compared to supervised learning. By combining thesefindings, we are able to considerably outperform previous methods forself-supervised and semi-supervised learning on ImageNet. A linear classifiertrained on self-supervised representations learned by SimCLR achieves 76.5%top-1 accuracy, which is a 7% relative improvement over previousstate-of-the-art, matching the performance of a supervised ResNet-50. Whenfine-tuned on only 1% of the labels, we achieve 85.8% top-5 accuracy,outperforming AlexNet with 100X fewer labels.

Quick Read (beta)

loading the full paper ...