Better To Ask in English? Evaluating Factual Accuracy of Multilingual LLMs in English and Low-Resource Languages

Abstract

Multilingual Large Language Models (LLMs) have demonstrated significanteffectiveness across various languages, particularly in high-resource languagessuch as English. However, their performance in terms of factual accuracy acrossother low-resource languages, especially Indic languages, remains an area ofinvestigation. In this study, we assess the factual accuracy of LLMs - GPT-4o,Gemma-2-9B, Gemma-2-2B, and Llama-3.1-8B - by comparing their performance inEnglish and Indic languages using the IndicQuest dataset, which containsquestion-answer pairs in English and 19 Indic languages. By asking the samequestions in English and their respective Indic translations, we analyzewhether the models are more reliable for regional context questions in Indiclanguages or when operating in English. Our findings reveal that LLMs oftenperform better in English, even for questions rooted in Indic contexts.Notably, we observe a higher tendency for hallucination in responses generatedin low-resource Indic languages, highlighting challenges in the multilingualunderstanding capabilities of current LLMs.

Quick Read (beta)

loading the full paper ...