Dialogue Generation: From Imitation Learning to Inverse Reinforcement Learning

Abstract

The performance of adversarial dialogue generation models relies on thequality of the reward signal produced by the discriminator. The reward signalfrom a poor discriminator can be very sparse and unstable, which may lead thegenerator to fall into a local optimum or to produce nonsense replies. Toalleviate the first problem, we first extend a recently proposed adversarialdialogue generation method to an adversarial imitation learning solution. Then,in the framework of adversarial inverse reinforcement learning, we propose anew reward model for dialogue generation that can provide a more accurate andprecise reward signal for generator training. We evaluate the performance ofthe resulting model with automatic metrics and human evaluations in twoannotation settings. Our experimental results demonstrate that our model cangenerate more high-quality responses and achieve higher overall performancethan the state-of-the-art.

Quick Read (beta)

loading the full paper ...