Dialect prejudice predicts AI decisions about people's character, employability, and criminality

Abstract

Hundreds of millions of people now interact with language models, with usesranging from serving as a writing aid to informing hiring decisions. Yet theselanguage models are known to perpetuate systematic racial prejudices, makingtheir judgments biased in problematic ways about groups like African Americans.While prior research has focused on overt racism in language models, socialscientists have argued that racism with a more subtle character has developedover time. It is unknown whether this covert racism manifests in languagemodels. Here, we demonstrate that language models embody covert racism in theform of dialect prejudice: we extend research showing that Americans holdraciolinguistic stereotypes about speakers of African American English and findthat language models have the same prejudice, exhibiting covert stereotypesthat are more negative than any human stereotypes about African Americans everexperimentally recorded, although closest to the ones from before the civilrights movement. By contrast, the language models' overt stereotypes aboutAfrican Americans are much more positive. We demonstrate that dialect prejudicehas the potential for harmful consequences by asking language models to makehypothetical decisions about people, based only on how they speak. Languagemodels are more likely to suggest that speakers of African American English beassigned less prestigious jobs, be convicted of crimes, and be sentenced todeath. Finally, we show that existing methods for alleviating racial bias inlanguage models such as human feedback training do not mitigate the dialectprejudice, but can exacerbate the discrepancy between covert and overtstereotypes, by teaching language models to superficially conceal the racismthat they maintain on a deeper level. Our findings have far-reachingimplications for the fair and safe employment of language technology.

Quick Read (beta)

loading the full paper ...