Data Augmentation for Voice-Assistant NLU using BERT-based Interchangeable Rephrase

Abstract

We introduce a data augmentation technique based on byte pair encoding and aBERT-like self-attention model to boost performance on spoken languageunderstanding tasks. We compare and evaluate this method with a range ofaugmentation techniques encompassing generative models such as VAEs andperformance-boosting techniques such as synonym replacement andback-translation. We show our method performs strongly on domain and intentclassification tasks for a voice assistant and in a user-study focused onutterance naturalness and semantic similarity.

Quick Read (beta)

loading the full paper ...