Pretrained Semantic Speech Embeddings for End-to-End Spoken Language Understanding via Cross-Modal Teacher-Student Learning

  • 2020-07-03 17:43:12
  • Pavel Denisov, Ngoc Thang Vu
  • 0

Abstract

Spoken language understanding is typically based on pipeline architecturesincluding speech recognition and natural language understanding steps.Therefore, these components are optimized independently from each other and theoverall system suffers from error propagation. In this paper, we propose anovel training method that enables pretrained contextual embeddings such asBERT to process acoustic features. In particular, we extend it with an encoderof pretrained speech recognition systems in order to construct end-to-endspoken language understanding systems. Our proposed method is based on theteacher-student framework across speech and text modalities that aligns theacoustic and the semantic latent spaces. Experimental results in threebenchmark datasets show that our system reaches the pipeline architectureperformance without using any training data and outperforms it afterfine-tuning with only a few examples.

 

Quick Read (beta)

loading the full paper ...