Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and Generalization Guarantees

Abstract

Safety is a critical component of autonomous systems and remains a challengefor learning-based policies to be utilized in the real world. In particular,policies learned using reinforcement learning often fail to generalize to novelenvironments due to unsafe behavior. In this paper, we proposeSim-to-Lab-to-Real to safely close the reality gap. To improve safety, we applya dual policy setup where a performance policy is trained using the cumulativetask reward and a backup (safety) policy is trained by solving the reach-avoidBellman Equation based on Hamilton-Jacobi reachability analysis. In Sim-to-Labtransfer, we apply a supervisory control scheme to shield unsafe actions duringexploration; in Lab-to-Real transfer, we leverage the Probably ApproximatelyCorrect (PAC)-Bayes framework to provide lower bounds on the expectedperformance and safety of policies in unseen environments. We empirically studythe proposed framework for ego-vision navigation in two types of indoorenvironments including a photo-realistic one. We also demonstrate stronggeneralization performance through hardware experiments in real indoor spaceswith a quadrupedal robot. Seehttps://sites.google.com/princeton.edu/sim-to-lab-to-real for supplementarymaterial.

Quick Read (beta)

loading the full paper ...