Experimental Evaluation of Deep Learning models for Marathi Text Classification

Abstract

The Marathi language is one of the prominent languages used in India. It ispredominantly spoken by the people of Maharashtra. Over the past decade, theusage of language on online platforms has tremendously increased. However,research on Natural Language Processing (NLP) approaches for Marathi text hasnot received much attention. Marathi is a morphologically rich language anduses a variant of the Devanagari script in the written form. This works aims toprovide a comprehensive overview of available resources and models for Marathitext classification. We evaluate CNN, LSTM, ULMFiT, and BERT based models ontwo publicly available Marathi text classification datasets and present acomparative analysis. The pre-trained Marathi fast text word embeddings byFacebook and IndicNLP are used in conjunction with word-based models. We showthat basic single layer models based on CNN and LSTM coupled with FastTextembeddings perform on par with the BERT based models on the available datasets.We hope our paper aids focused research and experiments in the area of MarathiNLP.

Quick Read (beta)

loading the full paper ...