SUMBT+LaRL: End-to-end Neural Task-oriented Dialog System with Reinforcement Learning

Abstract

The recent advent of neural approaches for developing each dialog componentin task-oriented dialog systems has remarkably improved, yet optimizing theoverall system performance remains a challenge. In this paper, we propose anend-to-end trainable neural dialog system with reinforcement learning, namedSUMBT+LaRL. The SUMBT+ estimates user-acts as well as dialog belief states, andthe LaRL models latent system action spaces and generates responses given theestimated contexts. We experimentally demonstrate that the training frameworkin which the SUMBT+ and LaRL are separately pretrained and then the entiresystem is fine-tuned significantly increases dialog success rates. We proposenew success criteria for reinforcement learning to the end-to-end dialog systemas well as provide experimental analysis on a different result aspect dependingon the success criteria and evaluation methods. Consequently, our modelachieved the new state-of-the-art success rate of 85.4% on corpus-basedevaluation, and a comparable success rate of 81.40% on simulator-basedevaluation provided by the DSTC8 challenge.

Quick Read (beta)

loading the full paper ...