Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination

Abstract

We present a large-scale study of linguistic bias exhibited by ChatGPTcovering ten dialects of English (Standard American English, Standard BritishEnglish, and eight widely spoken non-"standard" varieties from around theworld). We prompted GPT-3.5 Turbo and GPT-4 with text by native speakers ofeach variety and analyzed the responses via detailed linguistic featureannotation and native speaker evaluation. We find that the models default to"standard" varieties of English; based on evaluation by native speakers, wealso find that model responses to non-"standard" varieties consistently exhibita range of issues: stereotyping (19% worse than for "standard" varieties),demeaning content (25% worse), lack of comprehension (9% worse), andcondescending responses (15% worse). We also find that if these models areasked to imitate the writing style of prompts in non-"standard" varieties, theyproduce text that exhibits lower comprehension of the input and is especiallyprone to stereotyping. GPT-4 improves on GPT-3.5 in terms of comprehension,warmth, and friendliness, but also exhibits a marked increase in stereotyping(+18%). The results indicate that GPT-3.5 Turbo and GPT-4 can perpetuatelinguistic discrimination toward speakers of non-"standard" varieties.

Quick Read (beta)

loading the full paper ...