Option Compatible Reward Inverse Reinforcement Learning

Abstract

Reinforcement learning with complex tasks is a challenging problem. Often,expert demonstrations of complex multitasking operations are required to trainagents. However, it is difficult to design a reward function for given complextasks. In this paper, we solve a hierarchical inverse reinforcement learning(IRL) problem within the framework of options. A gradient method forparametrized options is used to deduce a defining equation for the Q-featurespace, which leads to a reward feature space. Using a second-order optimalitycondition for option parameters, an optimal reward function is selected.Experimental results in both discrete and continuous domains confirm that oursegmented rewards provide a solution to the IRL problem for multitaskingoperations and show good performance and robustness against the noise createdby expert demonstrations.

Quick Read (beta)

loading the full paper ...