Abstract
Self-supervised learning has emerged as a strategy to reduce the reliance oncostly supervised signal by pretraining representations only using unlabeleddata. These methods combine heuristic proxy classification tasks with dataaugmentations and have achieved significant success, but our theoreticalunderstanding of this success remains limited. In this paper we analyzeself-supervised representation learning using a causal framework. We show howdata augmentations can be more effectively utilized through explicit invarianceconstraints on the proxy classifiers employed during pretraining. Based onthis, we propose a novel self-supervised objective, Representation Learning viaInvariant Causal Mechanisms (ReLIC), that enforces invariant prediction ofproxy targets across augmentations through an invariance regularizer whichyields improved generalization guarantees. Further, using causality wegeneralize contrastive learning, a particular kind of self-supervised method,and provide an alternative theoretical explanation for the success of thesemethods. Empirically, ReLIC significantly outperforms competing methods interms of robustness and out-of-distribution generalization on ImageNet, whilealso significantly outperforming these methods on Atari achieving abovehuman-level performance on $51$ out of $57$ games.