Abstract
Training deep networks for semantic segmentation requires annotation of largeamounts of data, which can be time-consuming and expensive. Unfortunately,these trained networks still generalize poorly when tested in domains notconsistent with the training data. In this paper, we show that by carefullypresenting a mixture of labeled source domain and proxy-labeled target domaindata to a network, we can achieve state-of-the-art unsupervised domainadaptation results. With our design, the network progressively learns featuresspecific to the target domain using annotation from only the source domain. Wegenerate proxy labels for the target domain using the network's ownpredictions. Our architecture then allows selective mining of easy samples fromthis set of proxy labels, and hard samples from the annotated source domain. Weconduct a series of experiments with the GTA5, Cityscapes and BDD100k datasetson synthetic-to-real domain adaptation and geographic domain adaptation,showing the advantages of our method over baselines and existing approaches.