DeepHateExplainer: Explainable Hate Speech Detection in Under-resourced Bengali Language

Abstract

The exponential growths of social media and micro-blogging sites not onlyprovide platforms for empowering freedom of expression and individual voices,but also enables people to express anti-social behavior like online harassment,cyberbullying, and hate speech. Numerous works have been proposed to utilizethe textual data for social and anti-social behavior analysis, by predictingthe contexts mostly for highly-resourced languages like English. However, somelanguages are under-resourced, e.g., South Asian languages like Bengali, thatlack computational resources for accurate natural language processing (NLP). Inthis paper, we propose an explainable approach for hate speech detection fromthe under-resourced Bengali language, which we called DeepHateExplainer. In ourapproach, Bengali texts are first comprehensively preprocessed, beforeclassifying them into political, personal, geopolitical, and religious hates,by employing the neural ensemble method of different transformer-based neuralarchitectures (i.e., monolingual Bangla BERT-base, multilingualBERT-cased/uncased, and XLM-RoBERTa). Subsequently, important (most and least)terms are identified with sensitivity analysis and layer-wise relevancepropagation (LRP), before providing human-interpretable explanations. Finally,to measure the quality of the explanation (i.e., faithfulness), we compute thecomprehensiveness and sufficiency. Evaluations against machine learning (linearand tree-based models) and deep neural networks (i.e., CNN, Bi-LSTM, andConv-LSTM with word embeddings) baselines yield F1 scores of 84%, 90%, 88%, and88%, for political, personal, geopolitical, and religious hates, respectively,outperforming both ML and DNN baselines.

Quick Read (beta)

loading the full paper ...