White Men Lead, Black Women Help? Benchmarking and Mitigating Language Agency Social Biases in LLMs

  • 2025-05-30 23:39:05
  • Yixin Wan, Kai-Wei Chang
  • 0

Abstract

Social biases can manifest in language agency. However, very limited researchhas investigated such biases in Large Language Model (LLM)-generated content.In addition, previous works often rely on string-matching techniques toidentify agentic and communal words within texts, falling short of accuratelyclassifying language agency. We introduce the Language Agency Bias Evaluation(LABE) benchmark, which comprehensively evaluates biases in LLMs by analyzingagency levels attributed to different demographic groups in model generations.LABE tests for gender, racial, and intersectional language agency biases inLLMs on 3 text generation tasks: biographies, professor reviews, and referenceletters. Using LABE, we unveil language agency social biases in 3 recent LLMs:ChatGPT, Llama3, and Mistral. We observe that: (1) LLM generations tend todemonstrate greater gender bias than human-written texts; (2) Modelsdemonstrate remarkably higher levels of intersectional bias than the other biasaspects. (3) Prompt-based mitigation is unstable and frequently leads to biasexacerbation. Based on our observations, we propose Mitigation via SelectiveRewrite (MSR), a novel bias mitigation strategy that leverages an agencyclassifier to identify and selectively revise parts of generated texts thatdemonstrate communal traits. Empirical results prove MSR to be more effectiveand reliable than prompt-based mitigation method, showing a promising researchdirection.

 

Quick Read (beta)

loading the full paper ...