Bias at a Second Glance: A Deep Dive into Bias for German Educational Peer-Review Data Modeling

Abstract

Natural Language Processing (NLP) has become increasingly utilized to provideadaptivity in educational applications. However, recent research hashighlighted a variety of biases in pre-trained language models. While existingstudies investigate bias in different domains, they are limited in addressingfine-grained analysis on educational and multilingual corpora. In this work, weanalyze bias across text and through multiple architectures on a corpus of9,165 German peer-reviews collected from university students over five years.Notably, our corpus includes labels such as helpfulness, quality, and criticalaspect ratings from the peer-review recipient as well as demographicattributes. We conduct a Word Embedding Association Test (WEAT) analysis on (1)our collected corpus in connection with the clustered labels, (2) the mostcommon pre-trained German language models (T5, BERT, and GPT-2) and GloVeembeddings, and (3) the language models after fine-tuning on our collecteddata-set. In contrast to our initial expectations, we found that our collectedcorpus does not reveal many biases in the co-occurrence analysis or in theGloVe embeddings. However, the pre-trained German language models findsubstantial conceptual, racial, and gender bias and have significant changes inbias across conceptual and racial axes during fine-tuning on the peer-reviewdata. With our research, we aim to contribute to the fourth UN sustainabilitygoal (quality education) with a novel dataset, an understanding of biases innatural language education data, and the potential harms of not counteractingbiases in language models for educational tasks.

Quick Read (beta)

loading the full paper ...