Understanding Collapse in Non-Contrastive Learning

Abstract

Contrastive methods have led a recent surge in the performance ofself-supervised representation learning (SSL). Recent methods like BYOL orSimSiam purportedly distill these contrastive methods down to their essence,removing bells and whistles, including the negative examples, that do notcontribute to downstream performance. These "non-contrastive" methods worksurprisingly well without using negatives even though the global minimum liesat trivial collapse. We empirically analyze these non-contrastive methods andfind that SimSiam is extraordinarily sensitive to dataset and model size. Inparticular, SimSiam representations undergo partial dimensional collapse if themodel is too small relative to the dataset size. We propose a metric to measurethe degree of this collapse and show that it can be used to forecast thedownstream task performance without any fine-tuning or labels. We furtheranalyze architectural design choices and their effect on the downstreamperformance. Finally, we demonstrate that shifting to a continual learningsetting acts as a regularizer and prevents collapse, and a hybrid betweencontinual and multi-epoch training can improve linear probe accuracy by as manyas 18 percentage points using ResNet-18 on ImageNet.

Quick Read (beta)

loading the full paper ...