Automatic Speech Recognition using Advanced Deep Learning Approaches: A survey

Abstract

Recent advancements in deep learning (DL) have posed a significant challengefor automatic speech recognition (ASR). ASR relies on extensive trainingdatasets, including confidential ones, and demands substantial computationaland storage resources. Enabling adaptive systems improves ASR performance indynamic environments. DL techniques assume training and testing data originatefrom the same domain, which is not always true. Advanced DL techniques likedeep transfer learning (DTL), federated learning (FL), and reinforcementlearning (RL) address these issues. DTL allows high-performance models usingsmall yet related datasets, FL enables training on confidential data withoutdataset possession, and RL optimizes decision-making in dynamic environments,reducing computation costs. This survey offers a comprehensive review of DTL,FL, and RL-based ASR frameworks, aiming to provide insights into the latestdevelopments and aid researchers and professionals in understanding the currentchallenges. Additionally, transformers, which are advanced DL techniquesheavily used in proposed ASR frameworks, are considered in this survey fortheir ability to capture extensive dependencies in the input ASR sequence. Thepaper starts by presenting the background of DTL, FL, RL, and Transformers andthen adopts a well-designed taxonomy to outline the state-of-the-artapproaches. Subsequently, a critical analysis is conducted to identify thestrengths and weaknesses of each framework. Additionally, a comparative studyis presented to highlight the existing challenges, paving the way for futureresearch opportunities.

Quick Read (beta)

loading the full paper ...