G-SciEdBERT: A Contextualized LLM for Science Assessment Tasks in German

Abstract

The advancement of natural language processing has paved the way forautomated scoring systems in various languages, such as German (e.g., GermanBERT [G-BERT]). Automatically scoring written responses to science questions inGerman is a complex task and challenging for standard G-BERT as they lackcontextual knowledge in the science domain and may be unaligned with studentwriting styles. This paper developed a contextualized German Science EducationBERT (G-SciEdBERT), an innovative large language model tailored for scoringGerman-written responses to science tasks. Using G-BERT, we pre-trainedG-SciEdBERT on a corpus of 50K German written science responses with 5M tokensto the Programme for International Student Assessment (PISA) 2015. Wefine-tuned G-SciEdBERT on 59 assessment items and examined the scoringaccuracy. We then compared its performance with G-BERT. Our findings reveal asubstantial improvement in scoring accuracy with G-SciEdBERT, demonstrating a10% increase of quadratic weighted kappa compared to G-BERT (mean accuracydifference = 0.096, SD = 0.024). These insights underline the significance ofspecialized language models like G-SciEdBERT, which is trained to enhance theaccuracy of automated scoring, offering a substantial contribution to the fieldof AI in education.

Quick Read (beta)

loading the full paper ...