Analyzing and learning the language for different types of harassment

Abstract

Disclaimer: This paper is concerned with violent online harassment. Todescribe the subject at an adequate level of realism, examples of our collectedtweets involve violent, threatening, vulgar and hateful speech language in thecontext of racial, sexual, political, appearance and intellectual harassment. The presence of a significant amount of harassment in user-generated contentand its negative impact calls for robust automatic detection approaches. Thisrequires that we can identify different forms or types of harassment. Earlierwork has classified harassing language in terms of hurtfulness, abusiveness,sentiment, and profanity. However, to identify and understand harassment moreaccurately, it is essential to determine the context that represents theinterrelated conditions in which they occur. In this paper, we introduce thenotion of contextual type to harassment involving five categories: (i) sexual,(ii) racial, (iii) appearance-related, (iv) intellectual and (v) political. Weutilize an annotated corpus from Twitter distinguishing these types ofharassment. To study the context for each type that sheds light on thelinguistic meaning, interpretation, and distribution, we conduct two lines ofinvestigation: an extensive linguistic analysis, and a statistical distributionof unigrams. We then build type-ware classifiers to automate the identificationof type-specific harassment. Our experiments demonstrate that these classifiersprovide competitive accuracy for identifying and analyzing harassment on socialmedia. We present extensive discussion and major observations about theeffectiveness of type-aware classifiers using a detailed comparison setupproviding insight into the role of type-dependent features.

Quick Read (beta)

loading the full paper ...