Abstract
State-of-the-art stereo matching networks have difficulties in generalizingto new unseen environments due to significant domain differences, such ascolor, illumination, contrast, and texture. In this paper, we aim at designinga domain-invariant stereo matching network (DSMNet) that generalizes well tounseen scenes. To achieve this goal, we propose i) a novel "domainnormalization" approach that regularizes the distribution of learnedrepresentations to allow them to be invariant to domain differences, and ii) atrainable non-local graph-based filter for extracting robust structural andgeometric representations that can further enhance domain-invariantgeneralizations. When trained on synthetic data and generalized to real testsets, our model performs significantly better than all state-of-the-art models.It even outperforms some deep learning models (e.g. MC-CNN) fine-tuned withtest-domain data.