Is Speech Emotion Recognition Language-Independent? Analysis of English and Bangla Languages using Language-Independent Vocal Features

  • 2021-11-21 09:28:49
  • Fardin Saad, Hasan Mahmud, Md. Alamin Shaheen, Md. Kamrul Hasan, Paresha Farastu
A language agnostic approach to recognizing emotions from speech remains anincomplete and challenging task. In this paper, we used Bangla and Englishlanguages to assess whether distinguishing emotions from speech is independentof language. The following emotions were categorized for this study: happiness,anger, neutral, sadness, disgust, and fear. We employed three Emotional SpeechSets, of which the first two were developed by native Bengali speakers inBangla and English languages separately. The third was the Toronto EmotionalSpeech Set (TESS), which was developed by native English speakers from Canada.We carefully selected language-independent prosodic features, adopted a SupportVector Machine (SVM) model, and conducted three experiments to carry out ourproposition. In the first experiment, we measured the performance of the threespeech sets individually. This was followed by the second experiment, where werecorded the classification rate by combining the speech sets. Finally, in thethird experiment we measured the recognition rate by training and testing themodel with different speech sets. Although this study reveals that SpeechEmotion Recognition (SER) is mostly language-independent, there is somedisparity while recognizing emotional states like disgust and fear in these twolanguages. Moreover, our investigations inferred that non-native speakersconvey emotions through speech, much like expressing themselves in their nativetongue.


