Behind Maya: Building a Multilingual Vision Language Model

Abstract

In recent times, we have seen a rapid development of large Vision-LanguageModels (VLMs). They have shown impressive results on academic benchmarks,primarily in widely spoken languages but lack performance on low-resourcelanguages and varied cultural contexts. To address these limitations, weintroduce Maya, an open-source Multilingual VLM. Our contributions are: 1) amultilingual image-text pretraining dataset in eight languages, based on theLLaVA pretraining dataset; and 2) a multilingual image-text model supportingthese languages, enhancing cultural and linguistic comprehension invision-language tasks. Code available at https://github.com/nahidalam/maya.

Quick Read (beta)

loading the full paper ...