The Power of Noise: Redefining Retrieval for RAG Systems

  • 2024-01-29 18:52:52
  • Florin Cuconasu, Giovanni Trappolini, Federico Siciliano, Simone Filice, Cesare Campagnano, Yoelle Maarek, Nicola Tonellotto, Fabrizio Silvestri
  • 0

Abstract

Retrieval-Augmented Generation (RAG) systems represent a significantadvancement over traditional Large Language Models (LLMs). RAG systems enhancetheir generation ability by incorporating external data retrieved through anInformation Retrieval (IR) phase, overcoming the limitations of standard LLMs,which are restricted to their pre-trained knowledge and limited context window.Most research in this area has predominantly concentrated on the generativeaspect of LLMs within RAG systems. Our study fills this gap by thoroughly andcritically analyzing the influence of IR components on RAG systems. This paperanalyzes which characteristics a retriever should possess for an effectiveRAG's prompt formulation, focusing on the type of documents that should beretrieved. We evaluate various elements, such as the relevance of the documentsto the prompt, their position, and the number included in the context. Ourfindings reveal, among other insights, that including irrelevant documents canunexpectedly enhance performance by more than 30% in accuracy, contradictingour initial assumption of diminished quality. These results underscore the needfor developing specialized strategies to integrate retrieval with languagegeneration models, thereby laying the groundwork for future research in thisfield.

 

Quick Read (beta)

loading the full paper ...