HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models

Abstract

Analogies test a model's ability to infer implicit relationships betweenconcepts, making them a key benchmark for evaluating reasoning capabilities.While large language models (LLMs) are widely evaluated for reasoning inEnglish, their abilities in Indic languages remain understudied, limiting ourunderstanding of whether these models generalize across languages. To addressthis gap, we introduce a new Hindi Analogy Test Set (HATS), comprising 405multiple-choice questions sourced from Indian government exams. We benchmarkstate-of-the-art multilingual LLMs using various prompting strategies andintroduce a grounded Chain of Thought approach that leverages cognitivetheories of analogical reasoning. This approach improves model performance onHindi analogy questions. Our experiments show that models perform best withEnglish prompts, irrespective of the prompting strategy. Our test set addressesthe lack of a critical resource to evaluate LLM reasoning capabilities inHindi.

Quick Read (beta)

loading the full paper ...