KoDialogBench: Evaluating Conversational Understanding of Language Models with Korean Dialogue Benchmark

Abstract

As language models are often deployed as chatbot assistants, it becomes avirtue for models to engage in conversations in a user's first language. Whilethese models are trained on a wide range of languages, a comprehensiveevaluation of their proficiency in low-resource languages such as Korean hasbeen lacking. In this work, we introduce KoDialogBench, a benchmark designed toassess language models' conversational capabilities in Korean. To this end, wecollect native Korean dialogues on daily topics from public sources, ortranslate dialogues from other languages. We then structure these conversationsinto diverse test datasets, spanning from dialogue comprehension to responseselection tasks. Leveraging the proposed benchmark, we conduct extensiveevaluations and analyses of various language models to measure a foundationalunderstanding of Korean dialogues. Experimental results indicate that thereexists significant room for improvement in models' conversation skills.Furthermore, our in-depth comparisons across different language modelshighlight the effectiveness of recent training techniques in enhancingconversational proficiency. We anticipate that KoDialogBench will promote theprogress towards conversation-aware Korean language models.

Quick Read (beta)

loading the full paper ...