LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering

Abstract

Long-Context Question Answering (LCQA), a challenging task, aims to reasonover long-context documents to yield accurate answers to questions. Existinglong-context Large Language Models (LLMs) for LCQA often struggle with the"lost in the middle" issue. Retrieval-Augmented Generation (RAG) mitigates thisissue by providing external factual evidence. However, its chunking strategydisrupts the global long-context information, and its low-quality retrieval inlong contexts hinders LLMs from identifying effective factual details due tosubstantial noise. To this end, we propose LongRAG, a general,dual-perspective, and robust LLM-based RAG system paradigm for LCQA to enhanceRAG's understanding of complex long-context knowledge (i.e., global informationand factual details). We design LongRAG as a plug-and-play paradigm,facilitating adaptation to various domains and LLMs. Extensive experiments onthree multi-hop datasets demonstrate that LongRAG significantly outperformslong-context LLMs (up by 6.94%), advanced RAG (up by 6.16%), and Vanilla RAG(up by 17.25%). Furthermore, we conduct quantitative ablation studies andmulti-dimensional analyses, highlighting the effectiveness of the system'scomponents and fine-tuning strategies. Data and code are available athttps://github.com/QingFei1/LongRAG.

Quick Read (beta)

loading the full paper ...