Retrieval-Augmented Generation with Estimation of Source Reliability

Abstract

Retrieval-augmented generation (RAG) addresses key limitations of largelanguage models (LLMs), such as hallucinations and outdated knowledge, byincorporating external databases. These databases typically consult multiplesources to encompass up-to-date and various information. However, standard RAGmethods often overlook the heterogeneous source reliability in the multi-sourcedatabase and retrieve documents solely based on relevance, making them prone topropagating misinformation. To address this, we propose Reliability-Aware RAG(RA-RAG) which estimates the reliability of multiple sources and incorporatesthis information into both retrieval and aggregation processes. Specifically,it iteratively estimates source reliability and true answers for a set ofqueries with no labelling. Then, it selectively retrieves relevant documentsfrom a few of reliable sources and aggregates them using weighted majorityvoting, where the selective retrieval ensures scalability while notcompromising the performance. We also introduce a benchmark designed to reflectreal-world scenarios with heterogeneous source reliability and demonstrate theeffectiveness of RA-RAG compared to a set of baselines.

Quick Read (beta)

loading the full paper ...