On the Consistency of Multilingual Context Utilization in Retrieval-Augmented Generation

Abstract

Retrieval-augmented generation (RAG) with large language models (LLMs) hasdemonstrated strong performance in multilingual question-answering (QA) tasksby leveraging relevant passages retrieved from corpora. In multilingual RAG(mRAG), the retrieved passages can be written in languages other than that ofthe query entered by the user, making it challenging for LLMs to effectivelyutilize the provided information. Recent research suggests that retrievingpassages from multilingual corpora can improve RAG performance, particularlyfor low-resource languages. However, the extent to which LLMs can leveragedifferent kinds of multilingual contexts to generate accurate answers,*independently from retrieval quality*, remains understudied. In this paper, weconduct an extensive assessment of LLMs' ability to (i) make consistent use ofa relevant passage regardless of its language, (ii) respond in the expectedlanguage, and (iii) focus on the relevant passage even when multiple`distracting' passages in different languages are provided in the context. Ourexperiments with four LLMs across three QA datasets covering a total of 48languages reveal a surprising ability of LLMs to extract the relevantinformation from out-language passages, but a much weaker ability to formulatea full answer in the correct language. Our analysis, based on both accuracy andfeature attribution techniques, further shows that distracting passagesnegatively impact answer quality regardless of their language. However,distractors in the query language exert a slightly stronger influence. Takentogether, our findings deepen the understanding of how LLMs utilize context inmRAG systems, providing directions for future improvements.

Quick Read (beta)

loading the full paper ...