Analysis of Indic Language Capabilities in LLMs

  • 2025-01-23 18:49:33
  • Aatman Vaidya, Tarunima Prabhakar, Denny George, Swair Shah
  • 0

Abstract

This report evaluates the performance of text-in text-out Large LanguageModels (LLMs) to understand and generate Indic languages. This evaluation isused to identify and prioritize Indic languages suited for inclusion in safetybenchmarks. We conduct this study by reviewing existing evaluation studies anddatasets; and a set of twenty-eight LLMs that support Indic languages. Weanalyze the LLMs on the basis of the training data, license for model and data,type of access and model developers. We also compare Indic language performanceacross evaluation datasets and find that significant performance disparities inperformance across Indic languages. Hindi is the most widely representedlanguage in models. While model performance roughly correlates with number ofspeakers for the top five languages, the assessment after that varies.

 

Quick Read (beta)

loading the full paper ...