Dense X Retrieval: What Retrieval Granularity Should We Use?

  • 2023-12-11 18:57:35
  • Tong Chen, Hongwei Wang, Sihao Chen, Wenhao Yu, Kaixin Ma, Xinran Zhao, Dong Yu, Hongming Zhang
  • 0

Abstract

Dense retrieval has become a prominent method to obtain relevant context orworld knowledge in open-domain NLP tasks. When we use a learned dense retrieveron a retrieval corpus at inference time, an often-overlooked design choice isthe retrieval unit in which the corpus is indexed, e.g. document, passage, orsentence. We discover that the retrieval unit choice significantly impacts theperformance of both retrieval and downstream tasks. Distinct from the typicalapproach of using passages or sentences, we introduce a novel retrieval unit,proposition, for dense retrieval. Propositions are defined as atomicexpressions within text, each encapsulating a distinct factoid and presented ina concise, self-contained natural language format. We conduct an empiricalcomparison of different retrieval granularity. Our results reveal thatproposition-based retrieval significantly outperforms traditional passage orsentence-based methods in dense retrieval. Moreover, retrieval by propositionalso enhances the performance of downstream QA tasks, since the retrieved textsare more condensed with question-relevant information, reducing the need forlengthy input tokens and minimizing the inclusion of extraneous, irrelevantinformation.

 

Quick Read (beta)

loading the full paper ...