The Wisdom of Many Queries: Complexity-Diversity Principle for Dense Retriever Training

  • 2026-02-24 15:35:33
  • Xincan Feng, Noriki Nishida, Yusuke Sakai, Yuji Matsumoto
  • 0

Abstract

Prior synthetic query generation for dense retrieval produces one query per document, focusing on quality. We systematically study multi-query synthesis, discovering a quality-diversity trade-off: quality benefits in-domain, diversity benefits out-of-domain (OOD). Experiments on 31 datasets show diversity especially benefits multi-hop retrieval. Analysis reveals diversity benefit correlates with query complexity ($r$$\geq$0.95), measured by content words (CW). We formalize this as the Complexity-Diversity Principle (CDP): query complexity determines optimal diversity. CDP provides thresholds (CW$>$10: use diversity; CW$<$7: avoid it) and enables CW-weighted training that improves OOD even with single-query data.

 

Quick Read (beta)

loading the full paper ...