Post-processing Networks: Method for Optimizing Pipeline Task-oriented Dialogue Systems using Reinforcement Learning

Abstract

Many studies have proposed methods for optimizing the dialogue performance ofan entire pipeline task-oriented dialogue system by jointly training modules inthe system using reinforcement learning. However, these methods are limited inthat they can only be applied to modules implemented using trainableneural-based methods. To solve this problem, we propose a method for optimizinga pipeline system composed of modules implemented with arbitrary methods fordialogue performance. With our method, neural-based components calledpost-processing networks (PPNs) are installed inside such a system topost-process the output of each module. All PPNs are updated to improve theoverall dialogue performance of the system by using reinforcement learning, notnecessitating each module to be differentiable. Through dialogue simulation andhuman evaluation on the MultiWOZ dataset, we show that our method can improvethe dialogue performance of pipeline systems consisting of various modules.

Quick Read (beta)

loading the full paper ...