A Multi-cascaded Deep Model for Bilingual SMS Classification

Abstract

Most studies on text classification are focused on the English language.However, short texts such as SMS are influenced by regional languages. Thismakes the automatic text classification task challenging due to themultilingual, informal, and noisy nature of language in the text. In this work,we propose a novel multi-cascaded deep learning model called McM for bilingualSMS classification. McM exploits $n$-gram level information as well aslong-term dependencies of text for learning. Our approach aims to learn a modelwithout any code-switching indication, lexical normalization, languagetranslation, or language transliteration. The model relies entirely upon thetext as no external knowledge base is utilized for learning. For this purpose,a 12 class bilingual text dataset is developed from SMS feedbacks of citizenson public services containing mixed Roman Urdu and English languages. Our modelachieves high accuracy for classification on this dataset and outperforms theprevious model for multilingual text classification, highlighting languageindependence of McM.

Quick Read (beta)

loading the full paper ...