WOLI at SemEval-2020 Task 12: Arabic Offensive Language Identification on Different Twitter Datasets

Abstract

Communicating through social platforms has become one of the principal meansof personal communications and interactions. Unfortunately, healthycommunication is often interfered by offensive language that can have damagingeffects on the users. A key to fight offensive language on social media is theexistence of an automatic offensive language detection system. This paperpresents the results and the main findings of SemEval-2020, Task 12 OffensEvalSub-task A Zampieri et al. (2020), on Identifying and categorising OffensiveLanguage in Social Media. The task was based on the Arabic OffensEval datasetMubarak et al. (2020). In this paper, we describe the system submitted byWideBot AI Lab for the shared task which ranked 10th out of 52 participantswith Macro-F1 86.9% on the golden dataset under CodaLab username"yasserotiefy". We experimented with various models and the best model is alinear SVM in which we use a combination of both character and word n-grams. Wealso introduced a neural network approach that enhanced the predictive abilityof our system that includes CNN, highway network, Bi-LSTM, and attentionlayers.

Quick Read (beta)

loading the full paper ...