Abstract
Large Language Models (LLMs) have become an increasingly important tool inresearch and society at large. While LLMs are regularly used all over the worldby experts and lay-people alike, they are predominantly developed withEnglish-speaking users in mind, performing well in English and otherwide-spread languages while less-resourced languages such as Luxembourgish areseen as a lower priority. This lack of attention is also reflected in thesparsity of available evaluation tools and datasets. In this study, weinvestigate the viability of language proficiency exams as such evaluationtools for the Luxembourgish language. We find that large models such asChatGPT, Claude and DeepSeek-R1 typically achieve high scores, while smallermodels show weak performances. We also find that the performances in suchlanguage exams can be used to predict performances in other NLP tasks.