Abstract
Extrapolating beyond-demonstrator (BD) through the inverse reinforcementlearning (IRL) algorithm aims to learn from and outperform the demonstrator. Insharp contrast to the conventional reinforcement learning (RL) algorithms,BD-IRL can overcome the dilemma incurred in the reward function design andimprove the exploration mechanism of RL, which opens new avenues to buildingsuperior expert systems. Most existing BD-IRL algorithms are performed in twostages by first inferring a reward function before learning a policy via RL.However, such two-stage BD-IRL algorithms suffer from high computationalcomplexity, weak robustness, and large performance variations. In particular, apoor reward function derived in the first stage will inevitably incur severeperformance loss in the second stage. In this work, we propose a hybridadversarial inverse reinforcement learning (HAIRL) algorithm that is one-stage,model-free, generative-adversarial (GA) fashion and curiosity-driven. Thanks tothe one-stage design, the HAIRL can integrate both the reward function learningand the policy optimization into one procedure, which leads to many advantagessuch as low computational complexity, high robustness, and strong adaptability.More specifically, HAIRL simultaneously imitates the demonstrator and exploresBD performance by utilizing hybrid rewards. In particular, the Wasserstein-1distance (WD) is introduced into HAIRL to stabilize the imitation procedurewhile a novel end-to-end curiosity module (ECM) is developed to improve theexploration. Finally, extensive simulation results confirm that HAIRL canachieve higher performance as compared to other similar BD-IRL algorithms. Ourcode is available at our GitHub website\footnote{\url{https://github.com/yuanmingqi/HAIRL}}.