Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition

  • 2021-09-14 17:58:09
  • Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi
  • 1


This paper is a study of performance-efficiency trade-offs in pre-trainedmodels for automatic speech recognition (ASR). We focus on wav2vec 2.0, andformalize several architecture designs that influence both the modelperformance and its efficiency. Putting together all our observations, weintroduce SEW (Squeezed and Efficient Wav2vec), a pre-trained modelarchitecture with significant improvements along both performance andefficiency dimensions across a variety of training setups. For example, underthe 100h-960h semi-supervised setup on LibriSpeech, SEW achieves a 1.9xinference speedup compared to wav2vec 2.0, with a 13.5% relative reduction inword error rate. With a similar inference time, SEW reduces word error rate by25-50% across different model sizes.


Quick Read (beta)

loading the full paper ...