Abstract
Safe Reinforcement Learning (Safe RL) aims to ensure safety when an RL agentconducts learning by interacting with real-world environments where improperactions can induce high costs or lead to severe consequences. In this paper, wepropose a novel Safe Skill Planning (SSkP) approach to enhance effective safeRL by exploiting auxiliary offline demonstration data. SSkP involves atwo-stage process. First, we employ PU learning to learn a skill risk predictorfrom the offline demonstration data. Then, based on the learned skill riskpredictor, we develop a novel risk planning process to enhance online safe RLand learn a risk-averse safe policy efficiently through interactions with theonline RL environment, while simultaneously adapting the skill risk predictorto the environment. We conduct experiments in several benchmark roboticsimulation environments. The experimental results demonstrate that the proposedapproach consistently outperforms previous state-of-the-art safe RL methods.