Learning Transferable Domain Priors for Safe Exploration in Reinforcement Learning

Abstract

Prior access to domain knowledge could significantly improve the performanceof a reinforcement learning agent. In particular, it could help agents avoidpotentially catastrophic exploratory actions, which would otherwise have to beexperienced during learning. In this work, we identify consistently undesirableactions in a set of previously learned tasks, and use pseudo-rewards associatedwith them to learn a prior policy. In addition to enabling safe exploratorybehaviors in subsequent tasks in the domain, these priors are transferable tosimilar environments, and can be learned off-policy and in parallel with thelearning of other tasks in the domain. We compare our approach to established,state-of-the-art algorithms in a grid-world navigation environment, anddemonstrate that it exhibits a superior performance with respect to avoidingunsafe actions while learning to perform arbitrary tasks in the domain. We alsopresent some theoretical analysis to support these results, and discuss theimplications and some alternative formulations of this approach, which couldalso be useful to accelerate learning in certain scenarios.

Quick Read (beta)

loading the full paper ...