Danoliteracy of Generative Large Language Models

  • 2025-03-04 07:13:20
  • Søren Vejlgaard Holm, Lars Kai Hansen, Martin Carsten Nielsen
  • 0

Abstract

The language technology moonshot moment of Generative Large Language Models(GLLMs) was not limited to English: These models brought a surge oftechnological applications, investments, and hype to low-resource languages aswell. However, the capabilities of these models in languages such as Danishwere, until recently, difficult to verify beyond qualitative demonstrations dueto a lack of applicable evaluation corpora. We present a GLLM benchmark toevaluate \emph{Danoliteracy}, a measure of Danish language and culturalcompetency across eight diverse scenarios such as Danish citizenship tests andabstractive social media question answering. This limited-size benchmark wasfound to produce a robust ranking that correlates to human feedback at $\rho\sim 0.8$ with GPT-4 and Claude Opus models achieving the highest rankings.Analyzing these model results across scenarios, we find one strong underlyingfactor explaining $95\%$ of scenario performance variance for GLLMs in Danish,suggesting a $g$ factor of model consistency in language adaptation.

 

Quick Read (beta)

loading the full paper ...