Multilingual Retrieval-Augmented Generation for Knowledge-Intensive Task

  • 2025-04-04 18:35:43
  • Leonardo Ranaldi, Barry Haddow, Alexandra Birch
  • 0

Abstract

Retrieval-augmented generation (RAG) has become a cornerstone of contemporaryNLP, enhancing large language models (LLMs) by allowing them to access richerfactual contexts through in-context retrieval. While effective in monolingualsettings, especially in English, its use in multilingual tasks remainsunexplored. This paper investigates the effectiveness of RAG across multiplelanguages by proposing novel approaches for multilingual open-domainquestion-answering. We evaluate the performance of various multilingual RAGstrategies, including question-translation (tRAG), which translates questionsinto English before retrieval, and Multilingual RAG (MultiRAG), where retrievaloccurs directly across multiple languages. Our findings reveal that tRAG, whileuseful, suffers from limited coverage. In contrast, MultiRAG improvesefficiency by enabling multilingual retrieval but introduces inconsistenciesdue to cross-lingual variations in the retrieved content. To address theseissues, we propose Crosslingual RAG (CrossRAG), a method that translatesretrieved documents into a common language (e.g., English) before generatingthe response. Our experiments show that CrossRAG significantly enhancesperformance on knowledge-intensive tasks, benefiting both high-resource andlow-resource languages.

 

Quick Read (beta)

loading the full paper ...