Rethinking Supervised Learning and Reinforcement Learning in Task-Oriented Dialogue Systems

Abstract

Dialogue policy learning for task-oriented dialogue systems has enjoyed greatprogress recently mostly through employing reinforcement learning methods.However, these approaches have become very sophisticated. It is time tore-evaluate it. Are we really making progress developing dialogue agents onlybased on reinforcement learning? We demonstrate how (1)~traditional supervisedlearning together with (2)~a simulator-free adversarial learning method can beused to achieve performance comparable to state-of-the-art RL-based methods.First, we introduce a simple dialogue action decoder to predict the appropriateactions. Then, the traditional multi-label classification solution for dialoguepolicy learning is extended by adding dense layers to improve the dialogueagent performance. Finally, we employ the Gumbel-Softmax estimator toalternatively train the dialogue agent and the dialogue reward model withoutusing reinforcement learning. Based on our extensive experimentation, we canconclude the proposed methods can achieve more stable and higher performancewith fewer efforts, such as the domain knowledge required to design a usersimulator and the intractable parameter tuning in reinforcement learning. Ourmain goal is not to beat reinforcement learning with supervised learning, butto demonstrate the value of rethinking the role of reinforcement learning andsupervised learning in optimizing task-oriented dialogue systems.

Quick Read (beta)

loading the full paper ...