SpeechBERT: Cross-Modal Pre-trained Language Model for End-to-end Spoken Question Answering

Abstract

While end-to-end models for spoken language understanding tasks have beenexplored recently, there is still no end-to-end model for spoken questionanswering (SQA) tasks, which would be catastrophically influenced by speechrecognition errors. Meanwhile, pre-trained language models, such as BERT, haveperformed successfully in text question answering. To bring this advantage ofpre-trained language models into spoken question answering, we proposeSpeechBERT, a cross-modal transformer-based pre-trained language model. Ourmodel can outperform conventional approaches on the dataset which contains bothcorrectly recognized answers and incorrectly recognized answers. Ourexperimental results show the potential of end-to-end SQA models.

Quick Read (beta)

loading the full paper ...