Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems

Abstract

In this work, we present a hybrid learning method for training task-orienteddialogue systems through online user interactions. Popular methods for learningtask-oriented dialogues include applying reinforcement learning with userfeedback on supervised pre-training models. Efficiency of such learning methodmay suffer from the mismatch of dialogue state distribution between offlinetraining and online interactive learning stages. To address this challenge, wepropose a hybrid imitation and reinforcement learning method, with which adialogue agent can effectively learn from its interaction with users bylearning from human teaching and feedback. We design a neural network basedtask-oriented dialogue agent that can be optimized end-to-end with the proposedlearning method. Experimental results show that our end-to-end dialogue agentcan learn effectively from the mistake it makes via imitation learning fromuser teaching. Applying reinforcement learning with user feedback after theimitation learning stage further improves the agent's capability insuccessfully completing a task.

Quick Read (beta)

loading the full paper ...