DRISHTIKON: A Multimodal Multilingual Benchmark for Testing Language Models' Understanding on Indian Culture

  • 2025-09-23 17:40:43
  • Arijit Maji, Raghvendra Kumar, Akash Ghosh, Anushka, Nemil Shah, Abhilekh Borah, Vanshika Shah, Nishant Mishra, Sriparna Saha
  • 0

Abstract

We introduce DRISHTIKON, a first-of-its-kind multimodal and multilingualbenchmark centered exclusively on Indian culture, designed to evaluate thecultural understanding of generative AI systems. Unlike existing benchmarkswith a generic or global scope, DRISHTIKON offers deep, fine-grained coverageacross India's diverse regions, spanning 15 languages, covering all states andunion territories, and incorporating over 64,000 aligned text-image pairs. Thedataset captures rich cultural themes including festivals, attire, cuisines,art forms, and historical heritage amongst many more. We evaluate a wide rangeof vision-language models (VLMs), including open-source small and large models,proprietary systems, reasoning-specialized VLMs, and Indic-focused models,across zero-shot and chain-of-thought settings. Our results expose keylimitations in current models' ability to reason over culturally grounded,multimodal inputs, particularly for low-resource languages and less-documentedtraditions. DRISHTIKON fills a vital gap in inclusive AI research, offering arobust testbed to advance culturally aware, multimodally competent languagetechnologies.

 

Quick Read (beta)

loading the full paper ...