A Deep Learning Approach for Similar Languages, Varieties and Dialects

Abstract

Deep learning mechanisms are prevailing approaches in recent days for thevarious tasks in natural language processing, speech recognition, imageprocessing and many others. To leverage this we use deep learning basedmechanism specifically Bidirectional- Long Short-Term Memory (B-LSTM) for thetask of dialectic identification in Arabic and German broadcast speech and LongShort-Term Memory (LSTM) for discriminating between similar Languages. Twounique B-LSTM models are created using the Large-vocabulary Continuous SpeechRecognition (LVCSR) based lexical features and a fixed length of 400 perutterance bottleneck features generated by i-vector framework. These modelswere evaluated on the VarDial 2017 datasets for the tasks Arabic, Germandialect identification with dialects of Egyptian, Gulf, Levantine, NorthAfrican, and MSA for Arabic and Basel, Bern, Lucerne, and Zurich for German.Also for the task of Discriminating between Similar Languages like Bosnian,Croatian and Serbian. The B-LSTM model showed accuracy of 0.246 on lexicalfeatures and accuracy of 0.577 bottleneck features of i-Vector framework.

Quick Read (beta)

loading the full paper ...