Automatic Expansion and Retargeting of Arabic Offensive Language Training

Abstract

Rampant use of offensive language on social media led to recent efforts onautomatic identification of such language. Though offensive language hasgeneral characteristics, attacks on specific entities may exhibit distinctphenomena such as malicious alterations in the spelling of names. In thispaper, we present a method for identifying entity specific offensive language.We employ two key insights, namely that replies on Twitter often implyopposition and some accounts are persistent in their offensiveness towardsspecific targets. Using our methodology, we are able to collect thousands oftargeted offensive tweets. We show the efficacy of the approach on Arabictweets with 13% and 79% relative F1-measure improvement in entity specificoffensive language detection when using deep-learning based and support vectormachine based classifiers respectively. Further, expanding the training setwith automatically identified offensive tweets directed at multiple entitiescan improve F1-measure by 48%.

Quick Read (beta)

loading the full paper ...