UD-KSL Treebank v1.3: A semi-automated framework for aligning XPOS-extracted units with UPOS tags

Abstract

The present study extends recent work on Universal Dependencies annotationsfor second-language (L2) Korean by introducing a semi-automated framework thatidentifies morphosyntactic constructions from XPOS sequences and aligns thoseconstructions with corresponding UPOS categories. We also broaden the existingL2-Korean corpus by annotating 2,998 new sentences from argumentative essays.To evaluate the impact of XPOS-UPOS alignments, we fine-tune L2-Koreanmorphosyntactic analysis models on datasets both with and without thesealignments, using two NLP toolkits. Our results indicate that the aligneddataset not only improves consistency across annotation layers but alsoenhances morphosyntactic tagging and dependency-parsing accuracy, particularlyin cases of limited annotated data.

Quick Read (beta)

loading the full paper ...