Is Word Error Rate a good evaluation metric for Speech Recognition in Indic Languages?

  • 2022-06-13 09:56:20
  • Priyanshi Shah, Harveen Singh Chadha, Anirudh Gupta, Ankur Dhuriya, Neeraj Chhimwal, Rishabh Gaur, Vivek Raghavan
  • 0

Abstract

We propose a new method for the calculation of error rates in AutomaticSpeech Recognition (ASR). This new metric is for languages that contain halfcharacters and where the same character can be written in different forms. Weimplement our methodology in Hindi which is one of the main languages fromIndic context and we think this approach is scalable to other similar languagescontaining a large character set. We call our metrics Alternate Word Error Rate(AWER) and Alternate Character Error Rate (ACER). We train our ASR models using wav2vec 2.0\cite{baevski2020wav2vec} for Indiclanguages. Additionally we use language models to improve our modelperformance. Our results show a significant improvement in analyzing the errorrates at word and character level and the interpretability of the ASR system isimproved upto $3$\% in AWER and $7$\% in ACER for Hindi. Our experimentssuggest that in languages which have complex pronunciation, there are multipleways of writing words without changing their meaning. In such cases AWER andACER will be more useful rather than WER and CER as metrics. Further, we opensource a new benchmarking dataset of 21 hours for Hindi with the new metricscripts.

 

Quick Read (beta)

loading the full paper ...