ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning

Abstract

Reinforcement learning (RL) is ubiquitous in the development of modern AIsystems. However, state-of-the-art RL agents require extensive, and potentiallyunsafe, interactions with their environments to learn effectively. Theselimitations confine RL agents to simulated environments, hindering theirability to learn directly in real-world settings. In this work, we presentActSafe, a novel model-based RL algorithm for safe and efficient exploration.ActSafe learns a well-calibrated probabilistic model of the system and plansoptimistically w.r.t. the epistemic uncertainty about the unknown dynamics,while enforcing pessimism w.r.t. the safety constraints. Under regularityassumptions on the constraints and dynamics, we show that ActSafe guaranteessafety during learning while also obtaining a near-optimal policy in finitetime. In addition, we propose a practical variant of ActSafe that builds onlatest model-based RL advancements and enables safe exploration even inhigh-dimensional settings such as visual control. We empirically show thatActSafe obtains state-of-the-art performance in difficult exploration tasks onstandard safe deep RL benchmarks while ensuring safety during learning.

Quick Read (beta)

loading the full paper ...