Silencing Empowerment, Allowing Bigotry: Auditing the Moderation of Hate Speech on Twitch

Abstract

To meet the demands of content moderation, online platforms have resorted toautomated systems. Newer forms of real-time engagement($\textit{e.g.}$, userscommenting on live streams) on platforms like Twitch exert additional pressureson the latency expected of such moderation systems. Despite their prevalence,relatively little is known about the effectiveness of these systems. In thispaper, we conduct an audit of Twitch's automated moderation tool($\texttt{AutoMod}$) to investigate its effectiveness in flagging hatefulcontent. For our audit, we create streaming accounts to act as siloed testbeds, and interface with the live chat using Twitch's APIs to send over$107,000$ comments collated from $4$ datasets. We measure $\texttt{AutoMod}$'saccuracy in flagging blatantly hateful content containing misogyny, racism,ableism and homophobia. Our experiments reveal that a large fraction of hatefulmessages, up to $94\%$ on some datasets, $\textit{bypass moderation}$.Contextual addition of slurs to these messages results in $100\%$ removal,revealing $\texttt{AutoMod}$'s reliance on slurs as a moderation signal. Wealso find that contrary to Twitch's community guidelines, $\texttt{AutoMod}$blocks up to $89.5\%$ of benign examples that use sensitive words inpedagogical or empowering contexts. Overall, our audit points to large gaps in$\texttt{AutoMod}$'s capabilities and underscores the importance for suchsystems to understand context effectively.

Quick Read (beta)

loading the full paper ...