BoQ: A Place is Worth a Bag of Learnable Queries

Abstract

In visual place recognition, accurately identifying and matching images oflocations under varying environmental conditions and viewpoints remains asignificant challenge. In this paper, we introduce a new technique, calledBag-of-Queries (BoQ), which learns a set of global queries designed to captureuniversal place-specific attributes. Unlike existing methods that employself-attention and generate the queries directly from the input features, BoQemploys distinct learnable global queries, which probe the input features viacross-attention, ensuring consistent information aggregation. In addition, ourtechnique provides an interpretable attention mechanism and integrates withboth CNN and Vision Transformer backbones. The performance of BoQ isdemonstrated through extensive experiments on 14 large-scale benchmarks. Itconsistently outperforms current state-of-the-art techniques including NetVLAD,MixVPR and EigenPlaces. Moreover, as a global retrieval technique (one-stage),BoQ surpasses two-stage retrieval methods, such as Patch-NetVLAD, TransVPR andR2Former, all while being orders of magnitude faster and more efficient. Thecode and model weights are publicly available athttps://github.com/amaralibey/Bag-of-Queries.

Quick Read (beta)

loading the full paper ...