Cross-lingual Inductive Transfer to Detect Offensive Language

Abstract

With the growing use of social media and its availability, many instances ofthe use of offensive language have been observed across multiple languages anddomains. This phenomenon has given rise to the growing need to detect theoffensive language used in social media cross-lingually. In OffensEval 2020,the organizers have released the \textit{multilingual Offensive LanguageIdentification Dataset} (mOLID), which contains tweets in five differentlanguages, to detect offensive language. In this work, we introduce across-lingual inductive approach to identify the offensive language in tweetsusing the contextual word embedding \textit{XLM-RoBERTa} (XLM-R). We show thatour model performs competitively on all five languages, obtaining the fourthposition in the English task with an F1-score of $0.919$ and eighth position inthe Turkish task with an F1-score of $0.781$. Further experimentation provesthat our model works competitively in a zero-shot learning environment, and isextensible to other languages.

Quick Read (beta)

loading the full paper ...