Abstract
Reinforcement learning algorithms need exploration to learn. However,unsupervised exploration prevents the deployment of such algorithms onsafety-critical tasks and limits real-world deployment. In this paper, wepropose a new algorithm called Ensemble Model Predictive Safety Certificationthat combines model-based deep reinforcement learning with tube-based modelpredictive control to correct the actions taken by a learning agent, keepingsafety constraint violations at a minimum through planning. Our approach aimsto reduce the amount of prior knowledge about the actual system by requiringonly offline data generated by a safe controller. Our results show that we canachieve significantly fewer constraint violations than comparable reinforcementlearning methods.