Hybrid Reinforcement Learning with Expert State Sequences

Abstract

Existing imitation learning approaches often require that the completedemonstration data, including sequences of actions and states, are available.In this paper, we consider a more realistic and difficult scenario where areinforcement learning agent only has access to the state sequences of anexpert, while the expert actions are unobserved. We propose a noveltensor-based model to infer the unobserved actions of the expert statesequences. The policy of the agent is then optimized via a hybrid objectivecombining reinforcement learning and imitation learning. We evaluated ourhybrid approach on an illustrative domain and Atari games. The empiricalresults show that (1) the agents are able to leverage state expert sequences tolearn faster than pure reinforcement learning baselines, (2) our tensor-basedaction inference model is advantageous compared to standard deep neuralnetworks in inferring expert actions, and (3) the hybrid policy optimizationobjective is robust against noise in expert state sequences.

Quick Read (beta)

loading the full paper ...