Feudal Reinforcement Learning for Dialogue Management in Large Domains

Abstract

Reinforcement learning (RL) is a promising approach to solve dialogue policyoptimisation. Traditional RL algorithms, however, fail to scale to largedomains due to the curse of dimensionality. We propose a novel DialogueManagement architecture, based on Feudal RL, which decomposes the decision intotwo steps; a first step where a master policy selects a subset of primitiveactions, and a second step where a primitive action is chosen from the selectedsubset. The structural information included in the domain ontology is used toabstract the dialogue state space, taking the decisions at each step usingdifferent parts of the abstracted state. This, combined with an informationsharing mechanism between slots, increases the scalability to large domains. Weshow that an implementation of this approach, based on Deep-Q Networks,significantly outperforms previous state of the art in several dialogue domainsand environments, without the need of any additional reward signal.

Quick Read (beta)

loading the full paper ...