All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages

Abstract

Existing Large Multimodal Models (LMMs) generally focus on only a few regionsand languages. As LMMs continue to improve, it is increasingly important toensure they understand cultural contexts, respect local sensitivities, andsupport low-resource languages, all while effectively integrating correspondingvisual cues. In pursuit of culturally diverse global multimodal models, ourproposed All Languages Matter Benchmark (ALM-bench) represents the largest andmost comprehensive effort to date for evaluating LMMs across 100 languages.ALM-bench challenges existing models by testing their ability to understand andreason about culturally diverse images paired with text in various languages,including many low-resource languages traditionally underrepresented in LMMresearch. The benchmark offers a robust and nuanced evaluation frameworkfeaturing various question formats, including true/false, multiple choice, andopen-ended questions, which are further divided into short and long-answercategories. ALM-bench design ensures a comprehensive assessment of a model'sability to handle varied levels of difficulty in visual and linguisticreasoning. To capture the rich tapestry of global cultures, ALM-bench carefullycurates content from 13 distinct cultural aspects, ranging from traditions andrituals to famous personalities and celebrations. Through this, ALM-bench notonly provides a rigorous testing ground for state-of-the-art open andclosed-source LMMs but also highlights the importance of cultural andlinguistic inclusivity, encouraging the development of models that can servediverse global populations effectively. Our benchmark is publicly available.

Quick Read (beta)

loading the full paper ...