Better to Ask in English: Evaluation of Large Language Models on English, Low-resource and Cross-Lingual Settings

Abstract

Large Language Models (LLMs) are trained on massive amounts of data, enablingtheir application across diverse domains and tasks. Despite their remarkableperformance, most LLMs are developed and evaluated primarily in English.Recently, a few multi-lingual LLMs have emerged, but their performance inlow-resource languages, especially the most spoken languages in South Asia, isless explored. To address this gap, in this study, we evaluate LLMs such asGPT-4, Llama 2, and Gemini to analyze their effectiveness in English comparedto other low-resource languages from South Asia (e.g., Bangla, Hindi, andUrdu). Specifically, we utilized zero-shot prompting and five different promptsettings to extensively investigate the effectiveness of the LLMs incross-lingual translated prompts. The findings of the study suggest that GPT-4outperformed Llama 2 and Gemini in all five prompt settings and across alllanguages. Moreover, all three LLMs performed better for English languageprompts than other low-resource language prompts. This study extensivelyinvestigates LLMs in low-resource language contexts to highlight theimprovements required in LLMs and language-specific resources to develop moregenerally purposed NLP applications.

Quick Read (beta)

loading the full paper ...