Reinforcement Learning from Imperfect Demonstrations under Soft Expert Guidance

Abstract

In this paper, we study Reinforcement Learning from Demonstrations (RLfD)that improves the exploration efficiency of Reinforcement Learning (RL) byproviding expert demonstrations. Most of existing RLfD methods requiredemonstrations to be perfect and sufficient, which yet is unrealistic to meetin practice. To work on imperfect demonstrations, we first define an imperfectexpert setting for RLfD in a formal way, and then point out that previousmethods suffer from two issues in terms of optimality and convergence,respectively. Upon the theoretical findings we have derived, we tackle thesetwo issues by regarding the expert guidance as a soft constraint on regulatingthe policy exploration of the agent, which eventually leads to a constrainedoptimization problem. We further demonstrate that such problem is able to beaddressed efficiently by performing a local linear search on its dual form.Considerable empirical evaluations on a comprehensive collection of benchmarksindicate our method attains consistent improvement over other RLfDcounterparts.

Quick Read (beta)

loading the full paper ...