Too good to be true? Predicting author profiles from abusive language

Abstract

The problem of online threats and abuse could potentially be mitigated with acomputational approach, where sources of abuse are better understood oridentified through author profiling. However, abusive language constitutes aspecific domain of language for which it has not yet been tested whetherdifferences emerge based on a text author's personality, age, or gender. Thisstudy examines statistical relationships between author demographics andabusive vs normal language, and performs prediction experiments forpersonality, age, and gender. Although some statistical relationships wereestablished between author characteristics and language use, these patterns didnot translate to high prediction performance. Personality traits were predictedwithin 15% of their actual value, age was predicted with an error margin of 10years, and gender was classified correctly in 70% of the cases. These resultsare poor when compared to previous research on author profiling, therefore weurge caution in applying this within the context of abusive language and threatassessment.

Quick Read (beta)

loading the full paper ...