Behind Maya: Building a Multilingual Vision Language Model

  • 2025-05-13 20:01:12
  • Nahid Alam, Karthik Reddy Kanjula, Surya Guthikonda, Timothy Chung, Bala Krishna S Vegesna, Abhipsha Das, Anthony Susevski, Ryan Sze-Yin Chan, S M Iftekhar Uddin, Shayekh Bin Islam, Roshan Santhosh, Snegha A, Drishti Sharma, Chen Liu, Isha Chaturvedi, Genta Indra Winata, Ashvanth. S, Snehanshu Mukherjee, Alham Fikri Aji
  • 0

Abstract

In recent times, we have seen a rapid development of large Vision-LanguageModels (VLMs). They have shown impressive results on academic benchmarks,primarily in widely spoken languages but lack performance on low-resourcelanguages and varied cultural contexts. To address these limitations, weintroduce Maya, an open-source Multilingual VLM. Our contributions are: 1) amultilingual image-text pretraining dataset in eight languages, based on theLLaVA pretraining dataset; and 2) a multilingual image-text model supportingthese languages, enhancing cultural and linguistic comprehension invision-language tasks. Code available at https://github.com/nahidalam/maya.

 

Quick Read (beta)

loading the full paper ...