Hybrid Adversarial Inverse Reinforcement Learning

Abstract

Learning from demonstrations then outperform the demonstrator is the advancedtarget of the inverse reinforcement learning (IRL), which is entitled asbeyond-demonstrator (BD)-IRL. The BD-IRL provides an entirely new method tobuild expert systems, which gets rid of the dilemma of reward function designand reduces the computation costs. Currently, most of the BD-IRL algorithms aretwo-stage, it first infer a reward function then learn the policy viareinforcement learning (RL). Because of the two separate procedures, thetwo-stage algorithms have high computation complexity and low robustness. Toovercome these flaw, we propose a BD-IRL framework entitled hybrid adversarialinverse reinforcement learning (HAIRL), which successfully integrates thereward learning and exploration into one procedure. The simulation results showthat the HAIRL is more efficient and robust when compared with other similarstate-of-the-art (SOTA) algorithms.

Quick Read (beta)

loading the full paper ...