Understanding Dimensional Collapse in Contrastive Self-supervised Learning

Abstract

Self-supervised visual representation learning aims to learn usefulrepresentations without relying on human annotations. Joint embedding approachbases on maximizing the agreement between embedding vectors from differentviews of the same image. Various methods have been proposed to solve thecollapsing problem where all embedding vectors collapse to a trivial constantsolution. Among these methods, contrastive learning prevents collapse vianegative sample pairs. It has been shown that non-contrastive methods sufferfrom a lesser collapse problem of a different nature: dimensional collapse,whereby the embedding vectors end up spanning a lower-dimensional subspaceinstead of the entire available embedding space. Here, we show that dimensionalcollapse also happens in contrastive learning. In this paper, we shed light onthe dynamics at play in contrastive learning that leads to dimensionalcollapse. Inspired by our theory, we propose a novel contrastive learningmethod, called DirectCLR, which directly optimizes the representation spacewithout relying on a trainable projector. Experiments show that DirectCLRoutperforms SimCLR with a trainable linear projector on ImageNet.

Quick Read (beta)

loading the full paper ...