Bias Analysis and Mitigation through Protected Attribute Detection and Regard Classification

  • 2025-09-02 07:57:34
  • Takuma Udagawa, Yang Zhao, Hiroshi Kanayama, Bishwaranjan Bhattacharjee
  • 0

Abstract

Large language models (LLMs) acquire general linguistic knowledge frommassive-scale pretraining. However, pretraining data mainly comprised ofweb-crawled texts contain undesirable social biases which can be perpetuated oreven amplified by LLMs. In this study, we propose an efficient yet effectiveannotation pipeline to investigate social biases in the pretraining corpora.Our pipeline consists of protected attribute detection to identify diversedemographics, followed by regard classification to analyze the languagepolarity towards each attribute. Through our experiments, we demonstrate theeffect of our bias analysis and mitigation measures, focusing on Common Crawlas the most representative pretraining corpus.

 

Quick Read (beta)

loading the full paper ...