Abstract
Federated Retrieval-Augmented Generation (Federated RAG) combines FederatedLearning (FL), which enables distributed model training without exposing rawdata, with Retrieval-Augmented Generation (RAG), which improves the factualaccuracy of language models by grounding outputs in external knowledge. Aslarge language models are increasingly deployed in privacy-sensitive domainssuch as healthcare, finance, and personalized assistance, Federated RAG offersa promising framework for secure, knowledge-intensive natural languageprocessing (NLP). To the best of our knowledge, this paper presents the firstsystematic mapping study of Federated RAG, covering literature publishedbetween 2020 and 2025. Following Kitchenham's guidelines for evidence-basedsoftware engineering, we develop a structured classification of researchfocuses, contribution types, and application domains. We analyze architecturalpatterns, temporal trends, and key challenges, including privacy-preservingretrieval, cross-client heterogeneity, and evaluation limitations. Our findingssynthesize a rapidly evolving body of research, identify recurring designpatterns, and surface open questions, providing a foundation for future work atthe intersection of RAG and federated systems.