Setting Standards in Turkish NLP: TR-MMLU for Large Language Model Evaluation

  • 2025-01-04 20:42:33
  • M. Ali Bayram, Ali Arda Fincan, Ahmet Semih Gümüş, Banu Diri, Savaş Yıldırım, Öner Aytaş
  • 0

Abstract

Language models have made remarkable advancements in understanding andgenerating human language, achieving notable success across a wide array ofapplications. However, evaluating these models remains a significant challenge,particularly for resource-limited languages such as Turkish. To address thisgap, we introduce the Turkish MMLU (TR-MMLU) benchmark, a comprehensiveevaluation framework designed to assess the linguistic and conceptualcapabilities of large language models (LLMs) in Turkish. TR-MMLU is constructedfrom a carefully curated dataset comprising 6200 multiple-choice questionsacross 62 sections, selected from a pool of 280000 questions spanning 67disciplines and over 800 topics within the Turkish education system. Thisbenchmark provides a transparent, reproducible, and culturally relevant toolfor evaluating model performance. It serves as a standard framework for TurkishNLP research, enabling detailed analyses of LLMs' capabilities in processingTurkish text and fostering the development of more robust and accurate languagemodels. In this study, we evaluate state-of-the-art LLMs on TR-MMLU, providinginsights into their strengths and limitations for Turkish-specific tasks. Ourfindings reveal critical challenges, such as the impact of tokenization andfine-tuning strategies, and highlight areas for improvement in model design. Bysetting a new standard for evaluating Turkish language models, TR-MMLU aims toinspire future innovations and support the advancement of Turkish NLP research.

 

Quick Read (beta)

loading the full paper ...