BnMMLU: Measuring Massive Multitask Language Understanding in Bengali

  • 2025-05-25 03:54:31
  • Saman Sarker Joy
  • 0

Abstract

The Massive Multitask Language Understanding (MMLU) benchmark has been widelyused to evaluate language models across various domains. However, existing MMLUdatasets primarily focus on high-resource languages such as English, whichleaves low-resource languages like Bengali underrepresented. In this paper, weintroduce BnMMLU, a benchmark to evaluate the multitask language understandingcapabilities of Bengali in language models. The dataset spans 23 domains,including science, humanities, mathematics and general knowledge and isstructured in a multiple-choice format to assess factual knowledge,application-based problem-solving and reasoning abilities of language models.It consists of 138,949 question-option pairs. We benchmark several proprietaryand open-source large language models (LLMs) on the BnMMLU test set.Additionally, we annotate the test set with three cognitive categories-factualknowledge, procedural application and reasoning-to gain deeper insights intomodel strengths and weaknesses across various cognitive tasks. The resultsreveal significant performance gaps, highlighting the need for improvedpre-training and fine-tuning strategies tailored to Bengali data. We releasethe dataset and benchmark results to facilitate further research in this area.

 

Quick Read (beta)

loading the full paper ...