Abstract
Patronizing and Condescending Language (PCL) is a form of discriminatorytoxic speech targeting vulnerable groups, threatening both online and offlinesafety. While toxic speech research has mainly focused on overt toxicity, suchas hate speech, microaggressions in the form of PCL remain underexplored.Additionally, dominant groups' discriminatory facial expressions and attitudestoward vulnerable communities can be more impactful than verbal cues, yet theseframe features are often overlooked. In this paper, we introduce the PCLMMdataset, the first Chinese multimodal dataset for PCL, consisting of 715annotated videos from Bilibili, with high-quality PCL facial frame spans. Wealso propose the MultiPCL detector, featuring a facial expression detectionmodule for PCL recognition, demonstrating the effectiveness of modalitycomplementarity in this challenging task. Our work makes an importantcontribution to advancing microaggression detection within the domain of toxicspeech.