SUMBT+LaRL: End-to-end Neural Task-oriented Dialog System with Reinforcement Learning

Abstract

The recent advent of neural approaches for developing each dialog componentin task-oriented dialog systems has greatly improved, yet optimizing theoverall system performance remains a challenge. In this paper, we propose anend-to-end trainable neural dialog system with reinforcement learning, namedSUMBT+LaRL. The SUMBT+ estimates user-acts as well as dialog belief states, andthe LaRL models latent system action spaces and generates response given theestimated contexts. We experimentally demonstrated that the training frameworkin which the SUMBT+ and LaRL are separately pretrained, then the entire systemis fine-tuned significantly increases dialog success rates. We propose newsuccess criteria for reinforcement learning to the end-to-end dialog system aswell as provide experimental analysis on a different result aspect depending onthe success criteria and evaluation methods. Consequently, our model achievedthe new state-of-the-art success rate of 85.4% on corpus-based evaluation, anda comparable success rate of 81.40% on simulator-based evaluation provided bythe DSTC8 challenge.

Quick Read (beta)

loading the full paper ...