Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures

  • 2025-10-28 05:46:25
  • Tyler A. Chang, Catherine Arnett, Abdelrahman Eldesokey, Abdelrahman Sadallah, Abeer Kashar, Abolade Daud, Abosede Grace Olanihun, Adamu Labaran Mohammed, Adeyemi Praise, Adhikarinayum Meerajita Sharma, Aditi Gupta, Afitab Iyigun, Afonso Simplício, Ahmed Essouaied, Aicha Chorana, Akhil Eppa, Akintunde Oladipo, Akshay Ramesh, Aleksei Dorkin, Alfred Malengo Kondoro, Alham Fikri Aji, Ali Eren Çetintaş, Allan Hanbury, Alou Dembele, Alp Niksarli, Álvaro Arroyo, Amin Bajand, Amol Khanna, Ana Chkhaidze, Ana Condez, Andiswa Mkhonto, Andrew Hoblitzell, Andrew Tran, Angelos Poulis, Anirban Majumder, Anna Vacalopoulou, Annette Kuuipolani Kanahele Wong, Annika Simonsen, Anton Kovalev, Ashvanth. S, Ayodeji Joseph Lana, Barkin Kinay, Bashar Alhafni, Benedict Cibalinda Busole, Bernard Ghanem, Bharti Nath
  • 0

Abstract

To date, there exist almost no culturally-specific evaluation benchmarks forlarge language models (LLMs) that cover a large number of languages andcultures. In this paper, we present Global PIQA, a participatory commonsensereasoning benchmark for over 100 languages, constructed by hand by 335researchers from 65 countries around the world. The 116 language varieties inGlobal PIQA cover five continents, 14 language families, and 23 writingsystems. In the non-parallel split of Global PIQA, over 50% of examplesreference local foods, customs, traditions, or other culturally-specificelements. We find that state-of-the-art LLMs perform well on Global PIQA inaggregate, but they exhibit weaker performance in lower-resource languages (upto a 37% accuracy gap, despite random chance at 50%). Open models generallyperform worse than proprietary models. Global PIQA highlights that in manylanguages and cultures, everyday knowledge remains an area for improvement,alongside more widely-discussed capabilities such as complex reasoning andexpert knowledge. Beyond its uses for LLM evaluation, we hope that Global PIQAprovides a glimpse into the wide diversity of cultures in which human languageis embedded.

 

Quick Read (beta)

loading the full paper ...