indicnlp@kgp at DravidianLangTech-EACL2021: Offensive Language Identification in Dravidian Languages

Abstract

The paper presents the submission of the team indicnlp@kgp to the EACL 2021shared task "Offensive Language Identification in Dravidian Languages." Thetask aimed to classify different offensive content types in 3 code-mixedDravidian language datasets. The work leverages existing state of the artapproaches in text classification by incorporating additional data and transferlearning on pre-trained models. Our final submission is an ensemble of anAWD-LSTM based model along with 2 different transformer model architecturesbased on BERT and RoBERTa. We achieved weighted-average F1 scores of 0.97,0.77, and 0.72 in the Malayalam-English, Tamil-English, and Kannada-Englishdatasets ranking 1st, 2nd, and 3rd on the respective tasks.

Quick Read (beta)

loading the full paper ...