The Enemy Among Us: Detecting Hate Speech with Threats Based 'Othering' Language Embeddings

Abstract

Offensive or antagonistic language targeted at individuals and social groupsbased on their personal characteristics (also known as cyber hate speech orcyberhate) has been frequently posted and widely circulated viathe World WideWeb. This can be considered as a key risk factor for individual and societaltension linked toregional instability. Automated Web-based cyberhate detectionis important for observing and understandingcommunity and regional societaltension - especially in online social networks where posts can be rapidlyandwidely viewed and disseminated. While previous work has involved usinglexicons, bags-of-words orprobabilistic language parsing approaches, they oftensuffer from a similar issue which is that cyberhate can besubtle and indirect -thus depending on the occurrence of individual words or phrases can lead to asignificantnumber of false negatives, providing inaccurate representation ofthe trends in cyberhate. This problemmotivated us to challenge thinking aroundthe representation of subtle language use, such as references toperceivedthreats from "the other" including immigration or job prosperity in a hatefulcontext. We propose anovel framework that utilises language use around theconcept of "othering" and intergroup threat theory toidentify these subtletiesand we implement a novel classification method using embedding learning tocomputesemantic distances between parts of speech considered to be part of an"othering" narrative. To validate ourapproach we conduct several experiments ondifferent types of cyberhate, namely religion, disability, race andsexualorientation, with F-measure scores for classifying hateful instances obtainedthrough applying ourmodel of 0.93, 0.86, 0.97 and 0.98 respectively, providinga significant improvement in classifier accuracy overthe state-of-the-art

Quick Read (beta)

loading the full paper ...