Sequence-to-Sequence Data Augmentation for Dialogue Language Understanding

  • 2018-07-04 13:07:53
  • Yutai Hou, Yijia Liu, Wanxiang Che, Ting Liu
  • 3

Abstract

In this paper, we study the problem of data augmentation for languageunderstanding in task-oriented dialogue system. In contrast to previous workwhich augments an utterance without considering its relation with otherutterances, we propose a sequence-to-sequence generation based dataaugmentation framework that leverages one utterance's same semanticalternatives in the training data. A novel diversity rank is incorporated intothe utterance representation to make the model produce diverse utterances andthese diversely augmented utterances help to improve the language understandingmodule. Experimental results on the Airline Travel Information System datasetand a newly created semantic frame annotation on Stanford Multi-turn,Multidomain Dialogue Dataset show that our framework achieves significantimprovements of 6.38 and 10.04 F-scores respectively when only a training setof hundreds utterances is represented. Case studies also confirm that ourmethod generates diverse utterances.

 

Quick Read (beta)

loading the full paper ...