Ko-MuSR: A Multistep Soft Reasoning Benchmark for LLMs Capable of Understanding Korean

  • 2025-10-28 07:42:59
  • Chanwoo Park, Suyoung Park, JiA Kang, Jongyeon Park, Sangho Kim, Hyunji M. Park, Sumin Bae, Mingyu Kang, Jaejin Lee
  • 0

Abstract

We present Ko-MuSR, the first benchmark to comprehensively evaluatemultistep, soft reasoning in long Korean narratives while minimizing datacontamination. Built following MuSR, Ko-MuSR features fully Korean narratives,reasoning chains, and multiple-choice questions verified by human annotatorsfor logical consistency and answerability. Evaluations of four large languagemodels -- two multilingual and two Korean-specialized -- show that multilingualmodels outperform Korean-focused ones even in Korean reasoning tasks,indicating cross-lingual generalization of reasoning ability. Carefullydesigned prompting strategies, which combine few-shot examples, reasoningtraces, and task-specific hints, further boost accuracy, approachinghuman-level performance. Ko-MuSR offers a solid foundation for advancing KoreanNLP by enabling systematic evaluation of long-context reasoning and promptingstrategies.

 

Quick Read (beta)

loading the full paper ...