Not All Languages are Equal: Insights into Multilingual Retrieval-Augmented Generation

Abstract

RALMs (Retrieval-Augmented Language Models) broaden their knowledge scope byincorporating external textual resources. However, the multilingual nature ofglobal knowledge necessitates RALMs to handle diverse languages, a topic thathas received limited research focus. In this work, we propose\textit{Futurepedia}, a carefully crafted benchmark containing parallel textsacross eight representative languages. We evaluate six multilingual RALMs usingour benchmark to explore the challenges of multilingual RALMs. Experimentalresults reveal linguistic inequalities: 1) high-resource languages stand out inMonolingual Knowledge Extraction; 2) Indo-European languages lead RALMs toprovide answers directly from documents, alleviating the challenge ofexpressing answers across languages; 3) English benefits from RALMs' selectionbias and speaks louder in multilingual knowledge selection. Based on thesefindings, we offer advice for improving multilingual Retrieval AugmentedGeneration. For monolingual knowledge extraction, careful attention must bepaid to cascading errors from translating low-resource languages intohigh-resource ones. In cross-lingual knowledge transfer, encouraging RALMs toprovide answers within documents in different languages can improve transferperformance. For multilingual knowledge selection, incorporating morenon-English documents and repositioning English documents can help mitigateRALMs' selection bias. Through comprehensive experiments, we underscore thecomplexities inherent in multilingual RALMs and offer valuable insights forfuture research.

Quick Read (beta)

loading the full paper ...