IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models

  • 2025-01-23 17:57:28
  • David Ifeoluwa Adelani, Jessica Ojo, Israel Abebe Azime, Jian Yun Zhuang, Jesujoba O. Alabi, Xuanli He, Millicent Ochieng, Sara Hooker, Andiswa Bukula, En-Shiun Annie Lee, Chiamaka Chukwuneke, Happy Buzaaba, Blessing Sibanda, Godson Kalipe, Jonathan Mukiibi, Salomon Kabongo, Foutse Yuehgoh, Mmasibidi Setaka, Lolwethu Ndolela, Nkiruka Odu, Rooweither Mabuya, Shamsuddeen Hassan Muhammad, Salomey Osei, Sokhar Samb, Tadesse Kebede Guge, Tombekai Vangoni Sherman, Pontus Stenetorp
  • 0

Abstract

Despite the widespread adoption of Large language models (LLMs), theirremarkable capabilities remain limited to a few high-resource languages.Additionally, many low-resource languages (\eg African languages) are oftenevaluated only on basic text classification tasks due to the lack ofappropriate or comprehensive benchmarks outside of high-resource languages. Inthis paper, we introduce IrokoBench -- a human-translated benchmark dataset for17 typologically-diverse low-resource African languages covering three tasks:natural language inference~(AfriXNLI), mathematical reasoning~(AfriMGSM), andmulti-choice knowledge-based question answering~(AfriMMLU). We use IrokoBenchto evaluate zero-shot, few-shot, and translate-test settings~(where test setsare translated into English) across 10 open and six proprietary LLMs. Ourevaluation reveals a significant performance gap between high-resourcelanguages~(such as English and French) and low-resource African languages. Weobserve a significant performance gap between open and proprietary models, withthe highest performing open model, Gemma 2 27B only at 63\% of thebest-performing proprietary model GPT-4o performance. In addition, machinetranslating the test set to English before evaluation helped to close the gapfor larger models that are English-centric, such as Gemma 2 27B and LLaMa 3.170B. These findings suggest that more efforts are needed to develop and adaptLLMs for African languages.

 

Quick Read (beta)

loading the full paper ...