All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages

  • 2024-11-25 15:44:42
  • Ashmal Vayani, Dinura Dissanayake, Hasindri Watawana, Noor Ahsan, Nevasini Sasikumar, Omkar Thawakar, Henok Biadglign Ademtew, Yahya Hmaiti, Amandeep Kumar, Kartik Kuckreja, Mykola Maslych, Wafa Al Ghallabi, Mihail Mihaylov, Chao Qin, Abdelrahman M Shaker, Mike Zhang, Mahardika Krisna Ihsani, Amiel Esplana, Monil Gokani, Shachar Mirkin, Harsh Singh, Ashay Srivastava, Endre Hamerlik, Fathinah Asma Izzati, Fadillah Adamsyah Maani, Sebastian Cavada, Jenny Chim, Rohit Gupta, Sanjay Manjunath, Kamila Zhumakhanova, Feno Heriniaina Rabevohitra, Azril Amirudin, Muhammad Ridzuan, Daniya Kareem, Ketan More, Kunyang Li, Pramesh Shakya, Muhammad Saad, Amirpouya Ghasemaghaei, Amirbek Djanibekov, Dilshod Azizov, Branislava Jankovic, Naman Bhatia, Alvaro Cabrera, Johan Obando-Ceron, Olympiah Otieno, Fabi
  • 0

Abstract

Existing Large Multimodal Models (LMMs) generally focus on only a few regionsand languages. As LMMs continue to improve, it is increasingly important toensure they understand cultural contexts, respect local sensitivities, andsupport low-resource languages, all while effectively integrating correspondingvisual cues. In pursuit of culturally diverse global multimodal models, ourproposed All Languages Matter Benchmark (ALM-bench) represents the largest andmost comprehensive effort to date for evaluating LMMs across 100 languages.ALM-bench challenges existing models by testing their ability to understand andreason about culturally diverse images paired with text in various languages,including many low-resource languages traditionally underrepresented in LMMresearch. The benchmark offers a robust and nuanced evaluation frameworkfeaturing various question formats, including true/false, multiple choice, andopen-ended questions, which are further divided into short and long-answercategories. ALM-bench design ensures a comprehensive assessment of a model'sability to handle varied levels of difficulty in visual and linguisticreasoning. To capture the rich tapestry of global cultures, ALM-bench carefullycurates content from 13 distinct cultural aspects, ranging from traditions andrituals to famous personalities and celebrations. Through this, ALM-bench notonly provides a rigorous testing ground for state-of-the-art open andclosed-source LMMs but also highlights the importance of cultural andlinguistic inclusivity, encouraging the development of models that can servediverse global populations effectively. Our benchmark is publicly available.

 

Quick Read (beta)

loading the full paper ...