Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive Language Identification using Pre-trained Language Models

Abstract

This paper describes Galileo's performance in SemEval-2020 Task 12 ondetecting and categorizing offensive language in social media. For OffensiveLanguage Identification, we proposed a multi-lingual method using Pre-trainedLanguage Models, ERNIE and XLM-R. For offensive language categorization, weproposed a knowledge distillation method trained on soft labels generated byseveral supervised models. Our team participated in all three sub-tasks. InSub-task A - Offensive Language Identification, we ranked first in terms ofaverage F1 scores in all languages. We are also the only team which rankedamong the top three across all languages. We also took the first place inSub-task B - Automatic Categorization of Offense Types and Sub-task C - OffenceTarget Identification.

Quick Read (beta)

loading the full paper ...