Dialogue Transformers - Paper Detail

Abstract

We introduce a dialogue policy based on a transformer architecture, where theself-attention mechanism operates over the sequence of dialogue turns. Recentwork has used hierarchical recurrent neural networks to encode multipleutterances in a dialogue context, but we argue that a pure self-attentionmechanism is more suitable. By default, an RNN assumes that every item in asequence is relevant for producing an encoding of the full sequence, but asingle conversation can consist of multiple overlapping discourse segments asspeakers interleave multiple topics. A transformer picks which turns to includein its encoding of the current dialogue state, and is naturally suited toselectively ignoring or attending to dialogue history. We compare theperformance of the Transformer Embedding Dialogue (TED) policy to an LSTM andto the REDP, which was specifically designed to overcome this limitation ofRNNs. We show that the TED policy's behaviour compares favourably, both interms of accuracy and speed.

Quick Read (beta)

loading the full paper ...