Abstract
Neural information retrieval (IR) has greatly advanced search and otherknowledge-intensive language tasks. While many neural IR methods encode queriesand documents into single-vector representations, late interaction modelsproduce multi-vector representations at the granularity of each token anddecompose relevance modeling into scalable token-level computations. Thisdecomposition has been shown to make late interaction more effective, but itinflates the space footprint of these models by an order of magnitude. In thiswork, we introduce ColBERTv2, a retriever that couples an aggressive residualcompression mechanism with a denoised supervision strategy to simultaneouslyimprove the quality and space footprint of late interaction. We evaluateColBERTv2 across a wide range of benchmarks, establishing state-of-the-artquality within and outside the training domain while reducing the spacefootprint of late interaction models by 5--8$\times$.