Multi-Head RAG: Solving Multi-Aspect Problems with LLMs

Abstract

Retrieval Augmented Generation (RAG) enhances the abilities of Large LanguageModels (LLMs) by enabling the retrieval of documents into the LLM context toprovide more accurate and relevant responses. Existing RAG solutions do notfocus on queries that may require fetching multiple documents withsubstantially different contents. Such queries occur frequently, but arechallenging because the embeddings of these documents may be distant in theembedding space, making it hard to retrieve them all. This paper introducesMulti-Head RAG (MRAG), a novel scheme designed to address this gap with asimple yet powerful idea: leveraging activations of Transformer's multi-headattention layer, instead of the decoder layer, as keys for fetchingmulti-aspect documents. The driving motivation is that different attentionheads can learn to capture different data aspects. Harnessing the correspondingactivations results in embeddings that represent various facets of data itemsand queries, improving the retrieval accuracy for complex queries. We providean evaluation methodology and metrics, synthetic datasets, and real-world usecases to demonstrate MRAG's effectiveness, showing improvements of up to 20% inrelevance over standard RAG baselines. MRAG can be seamlessly integrated withexisting RAG frameworks and benchmarking tools like RAGAS as well as differentclasses of data stores.

Quick Read (beta)

loading the full paper ...