HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models

Abstract

Large Language Models (LLMs) pretrained on massive corpora exhibit remarkablecapabilities across a wide range of tasks, however, the attention given tonon-English languages has been limited in this field of research. To addressthis gap and assess the proficiency of language models in the Korean languageand culture, we present HAE-RAE Bench, covering 6 tasks including vocabulary,history, and general knowledge. Our evaluation of language models on thisbenchmark highlights the potential advantages of employing LargeLanguage-Specific Models(LLSMs) over a comprehensive, universal model likeGPT-3.5. Remarkably, our study reveals that models approximately 13 timessmaller than GPT-3.5 can exhibit similar performance levels in terms oflanguage-specific knowledge retrieval. This observation underscores theimportance of homogeneous corpora for training professional-levellanguage-specific models. On the contrary, we also observe a perplexingperformance dip in these smaller LMs when they are tasked to generatestructured answers.

Quick Read (beta)

loading the full paper ...