PersianRAG: A Retrieval-Augmented Generation System for Persian Language

  • 2024-11-06 11:19:42
  • Hossein Hosseini, Mohammad Sobhan Zare, Amir Hossein Mohammadi, Arefeh Kazemi, Zahra Zojaji, Mohammad Ali Nematbakhsh
  • 0

Abstract

Retrieval augmented generation (RAG) models, which integrate large-scalepre-trained generative models with external retrieval mechanisms, have shownsignificant success in various natural language processing (NLP) tasks.However, applying RAG models in Persian language as a low-resource language,poses distinct challenges. These challenges primarily involve thepreprocessing, embedding, retrieval, prompt construction, language modeling,and response evaluation of the system. In this paper, we address the challengestowards implementing a real-world RAG system for Persian language calledPersianRAG. We propose novel solutions to overcome these obstacles and evaluateour approach using several Persian benchmark datasets. Our experimental resultsdemonstrate the capability of the PersianRAG framework to enhance questionanswering task in Persian.

 

Quick Read (beta)

loading the full paper ...