Semantic Correspondence with Transformers

  • 2021-06-04 14:39:03
  • Seokju Cho, Sunghwan Hong, Sangryul Jeon, Yunsung Lee, Kwanghoon Sohn, Seungryong Kim
  • 48


We propose a novel cost aggregation network, called Cost Aggregation withTransformers (CATs), to find dense correspondences between semantically similarimages with additional challenges posed by large intra-class appearance andgeometric variations. Compared to previous hand-crafted or CNN-based methodsaddressing the cost aggregation stage, which either lack robustness to severedeformations or inherit the limitation of CNNs that fail to discriminateincorrect matches due to limited receptive fields, CATs explore globalconsensus among initial correlation map with the help of some architecturaldesigns that allow us to exploit full potential of self-attention mechanism.Specifically, we include appearance affinity modelling to disambiguate theinitial correlation maps and multi-level aggregation to benefit fromhierarchical feature representations within Transformer-based aggregator, andcombine with swapping self-attention and residual connections not only toenforce consistent matching, but also to ease the learning process. We conductexperiments to demonstrate the effectiveness of the proposed model over thelatest methods and provide extensive ablation studies. Code and trained modelswill be made available at


