Training Vision Transformers for Image Retrieval

  • 2021-02-10 18:56:41
  • Alaaeldin El-Nouby, Natalia Neverova, Ivan Laptev, Hervé Jégou
  • 105

Abstract

Transformers have shown outstanding results for natural languageunderstanding and, more recently, for image classification. We here extend thiswork and propose a transformer-based approach for image retrieval: we adoptvision transformers for generating image descriptors and train the resultingmodel with a metric learning objective, which combines a contrastive loss witha differential entropy regularizer. Our results show consistent and significantimprovements of transformers over convolution-based approaches. In particular,our method outperforms the state of the art on several public benchmarks forcategory-level retrieval, namely Stanford Online Product, In-Shop and CUB-200.Furthermore, our experiments on ROxford and RParis also show that, incomparable settings, transformers are competitive for particular objectretrieval, especially in the regime of short vector representations andlow-resolution images.

 

Quick Read (beta)

loading the full paper ...