INDIC QA BENCHMARK: A Multilingual Benchmark to Evaluate Question Answering capability of LLMs for Indic Languages

Abstract

Large Language Models (LLMs) have demonstrated remarkable zero-shot andfew-shot capabilities in unseen tasks, including context-grounded questionanswering (QA) in English. However, the evaluation of LLMs' capabilities innon-English languages for context-based QA is limited by the scarcity ofbenchmarks in non-English languages. To address this gap, we introduceIndic-QA, the largest publicly available context-grounded question-answeringdataset for 11 major Indian languages from two language families. The datasetcomprises both extractive and abstractive question-answering tasks and includesexisting datasets as well as English QA datasets translated into Indianlanguages. Additionally, we generate a synthetic dataset using the Gemini modelto create question-answer pairs given a passage, which is then manuallyverified for quality assurance. We evaluate various multilingual Large LanguageModels and their instruction-fine-tuned variants on the benchmark and observethat their performance is subpar, particularly for low-resource languages. Wehope that the release of this dataset will stimulate further research on thequestion-answering abilities of LLMs for low-resource languages.

Quick Read (beta)

loading the full paper ...