Towards scalable efficient on-device ASR with transfer learning

Abstract

Multilingual pretraining for transfer learning significantly boosts therobustness of low-resource monolingual ASR models. This study systematicallyinvestigates three main aspects: (a) the impact of transfer learning on modelperformance during initial training or fine-tuning, (b) the influence oftransfer learning across dataset domains and languages, and (c) the effect onrare-word recognition compared to non-rare words. Our finding suggests thatRNNT-loss pretraining, followed by monolingual fine-tuning with Minimum WordError Rate (MinWER) loss, consistently reduces Word Error Rates (WER) acrosslanguages like Italian and French. WER Reductions (WERR) reach 36.2% and 42.8%compared to monolingual baselines for MLS and in-house datasets. Out-of-domainpretraining leads to 28% higher WERR than in-domain pretraining. Both rare andnon-rare words benefit, with rare words showing greater improvements without-of-domain pretraining, and non-rare words with in-domain pretraining.

Quick Read (beta)

loading the full paper ...