Abstract
While end-to-end models for spoken language understanding tasks have beenexplored recently, there is still no end-to-end model for spoken questionanswering (SQA) tasks, which would be catastrophically influenced by speechrecognition errors. Meanwhile, pre-trained language models, such as BERT, haveperformed successfully in text question answering. To bring this advantage ofpre-trained language models into spoken question answering, we proposeSpeechBERT, a cross-modal transformer-based pre-trained language model. Ourmodel can outperform conventional approaches on the dataset which contains bothcorrectly recognized answers and incorrectly recognized answers. Ourexperimental results show the potential of end-to-end SQA models.
Quick Read (beta)
loading the full paper ...