Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval

  • 2020-07-01 23:15:56
  • Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, Arnold Overwijk
  • 60

Abstract

Conducting text retrieval in a dense learned representation space has manyintriguing advantages over sparse retrieval. Yet the effectiveness of denseretrieval (DR) often requires combination with sparse retrieval. In this paper,we identify that the main bottleneck is in the training mechanisms, where thenegative instances used in training are not representative of the irrelevantdocuments in testing. This paper presents Approximate nearest neighbor NegativeContrastive Estimation (ANCE), a training mechanism that constructs negativesfrom an Approximate Nearest Neighbor (ANN) index of the corpus, which isparallelly updated with the learning process to select more realistic negativetraining instances. This fundamentally resolves the discrepancy between thedata distribution used in the training and testing of DR. In our experiments,ANCE boosts the BERT-Siamese DR model to outperform all competitive dense andsparse retrieval baselines. It nearly matches the accuracy ofsparse-retrieval-and-BERT-reranking using dot-product in the ANCE-learnedrepresentation space and provides almost 100x speed-up.

 

Quick Read (beta)

loading the full paper ...