QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMs

Abstract

The rise of large language models (LLMs) has created a need for advancedbenchmarking systems beyond traditional setups. To this end, we introduceQUENCH, a novel text-based English Quizzing Benchmark manually curated andtranscribed from YouTube quiz videos. QUENCH possesses masked entities andrationales for the LLMs to predict via generation. At the intersection ofgeographical context and common sense reasoning, QUENCH helps assess worldknowledge and deduction capabilities of LLMs via a zero-shot, open-domainquizzing setup. We perform an extensive evaluation on 7 LLMs and 4 metrics,investigating the influence of model size, prompting style, geographicalcontext, and gold-labeled rationale generation. The benchmarking concludes withan error analysis to which the LLMs are prone.

Quick Read (beta)

loading the full paper ...