Don't Patronize Me! An Annotated Dataset with Patronizing and Condescending Language towards Vulnerable Communities

Abstract

In this paper, we introduce a new annotated dataset which is aimed atsupporting the development of NLP models to identify and categorize languagethat is patronizing or condescending towards vulnerable communities (e.g.refugees, homeless people, poor families). While the prevalence of suchlanguage in the general media has long been shown to have harmful effects, itdiffers from other types of harmful language, in that it is generally usedunconsciously and with good intentions. We furthermore believe that the oftensubtle nature of patronizing and condescending language (PCL) presents aninteresting technical challenge for the NLP community. Our analysis of theproposed dataset shows that identifying PCL is hard for standard NLP models,with language models such as BERT achieving the best results.

Quick Read (beta)

loading the full paper ...