LuxBank: The First Universal Dependency Treebank for Luxembourgish

  • 2024-11-07 15:50:40
  • Alistair Plum, Caroline Döhmer, Emilia Milano, Anne-Marie Lutgen, Christoph Purschke
  • 0

Abstract

The Universal Dependencies (UD) project has significantly expanded linguisticcoverage across 161 languages, yet Luxembourgish, a West Germanic languagespoken by approximately 400,000 people, has remained absent until now. In thispaper, we introduce LuxBank, the first UD Treebank for Luxembourgish,addressing the gap in syntactic annotation and analysis for this `low-research'language. We establish formal guidelines for Luxembourgish language annotation,providing the foundation for the first large-scale quantitative analysis of itssyntax. LuxBank serves not only as a resource for linguists and languagelearners but also as a tool for developing spell checkers and grammar checkers,organising existing text archives and even training large language models. Byincorporating Luxembourgish into the UD framework, we aim to enhance theunderstanding of syntactic variation within West Germanic languages and offer amodel for documenting smaller, semi-standardised languages. This work positionsLuxembourgish as a valuable resource in the broader linguistic and NLPcommunities, contributing to the study of languages with limited research andresources.

 

Quick Read (beta)

loading the full paper ...