DeepHateExplainer: Explainable Hate Speech Detection in Under-resourced Bengali Language

Abstract

The exponential growths of social media and micro-blogging sites not onlyprovide platforms for empowering freedom of expressions and individual voices,but also enables people to express anti-social behavior like online harassment,cyberbullying, and hate speech. Numerous works have been proposed to utilizetextual data for social and anti-social behavior analysis, by predicting thecontexts mostly for highly-resourced languages like English. However, somelanguages are under-resourced, e.g., South Asian languages like Bengali, thatlack computational resources for accurate natural language processing (NLP). Inthis paper, we propose an explainable approach for hate speech detection fromthe under-resourced Bengali language, which we called DeepHateExplainer.Bengali texts are first comprehensively preprocessed, before classifying theminto political, personal, geopolitical, and religious hates using a neuralensemble method of transformer-based neural architectures (i.e., monolingualBangla BERT-base, multilingual BERT-cased/uncased, and XLM-RoBERTa).Important~(most and least) terms are then identified using sensitivity analysisand layer-wise relevance propagation~(LRP), before providinghuman-interpretable explanations. Finally, we compute comprehensiveness andsufficiency scores to measure the quality of explanations w.r.t faithfulness.Evaluations against machine learning~(linear and tree-based models) and neuralnetworks (i.e., CNN, Bi-LSTM, and Conv-LSTM with word embeddings) baselinesyield F1-scores of 78%, 91%, 89%, and 84%, for political, personal,geopolitical, and religious hates, respectively, outperforming both ML and DNNbaselines.

Quick Read (beta)

loading the full paper ...