Abstract
Contrastive learning relies on an assumption that positive pairs containrelated views, e.g., patches of an image or co-occurring multimodal signals ofa video, that share certain underlying information about an instance. But whatif this assumption is violated? The literature suggests that contrastivelearning produces suboptimal representations in the presence of noisy views,e.g., false positive pairs with no apparent shared information. In this work,we propose a new contrastive loss function that is robust against noisy views.We provide rigorous theoretical justifications by showing connections to robustsymmetric losses for noisy binary classification and by establishing a newcontrastive bound for mutual information maximization based on the Wassersteindistance measure. The proposed loss is completely modality-agnostic and asimple drop-in replacement for the InfoNCE loss, which makes it easy to applyto existing contrastive frameworks. We show that our approach providesconsistent improvements over the state-of-the-art on image, video, and graphcontrastive learning benchmarks that exhibit a variety of real-world noisepatterns.