Kurdish (Sorani) Speech to Text: Presenting an Experimental Dataset

Abstract

We present an experimental dataset, Basic Dataset for Sorani KurdishAutomatic Speech Recognition (BD-4SK-ASR), which we used in the first attemptin developing an automatic speech recognition for Sorani Kurdish. The objectiveof the project was to develop a system that automatically could recognizesimple sentences based on the vocabulary which is used in grades one to threeof the primary schools in the Kurdistan Region of Iraq. We used CMUSphinx asour experimental environment. We developed a dataset to train the system. Thedataset is publicly available for non-commercial use under the CC BY-NC-SA 4.0license.

Quick Read (beta)

loading the full paper ...