Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation

  • 2025-04-09 18:43:16
  • Israfel Salazar, Manuel Fernández Burda, Shayekh Bin Islam, Arshia Soltani Moakhar, Shivalika Singh, Fabian Farestam, Angelika Romanou, Danylo Boiko, Dipika Khullar, Mike Zhang, Dominik Krzemiński, Jekaterina Novikova, Luísa Shimabucoro, Joseph Marvin Imperial, Rishabh Maheshwary, Sharad Duwal, Alfonso Amayuelas, Swati Rajwal, Jebish Purbey, Ahmed Ruby, Nicholas Popovič, Marek Suppa, Azmine Toushik Wasi, Ram Mohan Rao Kadiyala, Olga Tsymboi, Maksim Kostritsya, Bardia Soltani Moakhar, Gabriel da Costa Merlin, Otávio Ferracioli Coletti, Maral Jabbari Shiviari, MohammadAmin farahani fard, Silvia Fernandez, María Grandury, Dmitry Abulkhanov, Drishti Sharma, Andre Guarnier De Mitri, Leticia Bossatto Marchezi, Johan Obando-Ceron, Nazar Kohut, Beyza Ermis, Desmond Elliott, Enzo Ferrante, Sara Hoo
  • 0

Abstract

The evaluation of vision-language models (VLMs) has mainly relied onEnglish-language benchmarks, leaving significant gaps in both multilingual andmulticultural coverage. While multilingual benchmarks have expanded, both insize and languages, many rely on translations of English datasets, failing tocapture cultural nuances. In this work, we propose Kaleidoscope, as the mostcomprehensive exam benchmark to date for the multilingual evaluation ofvision-language models. Kaleidoscope is a large-scale, in-language multimodalbenchmark designed to evaluate VLMs across diverse languages and visual inputs.Kaleidoscope covers 18 languages and 14 different subjects, amounting to atotal of 20,911 multiple-choice questions. Built through an open sciencecollaboration with a diverse group of researchers worldwide, Kaleidoscopeensures linguistic and cultural authenticity. We evaluate top-performingmultilingual vision-language models and find that they perform poorly onlow-resource languages and in complex multimodal scenarios. Our resultshighlight the need for progress on culturally inclusive multimodal evaluationframeworks.

 

Quick Read (beta)

loading the full paper ...