Abstract
Contrastive learning methods have significantly narrowed the gap betweensupervised and unsupervised learning on computer vision tasks. In this paper,we explore their application to remote sensing, where unlabeled data is oftenabundant but labeled data is scarce. We first show that due to their differentcharacteristics, a non-trivial gap persists between contrastive and supervisedlearning on standard benchmarks. To close the gap, we propose novel trainingmethods that exploit the spatiotemporal structure of remote sensing data. Weleverage spatially aligned images over time to construct temporal positivepairs in contrastive learning and geo-location to design pre-text tasks. Ourexperiments show that our proposed method closes the gap between contrastiveand supervised learning on image classification, object detection and semanticsegmentation for remote sensing and other geo-tagged image datasets