Abstract
Safety is a critical concern when deploying reinforcement learning agents forrealistic tasks. Recently, safe reinforcement learning algorithms have beendeveloped to optimize the agent's performance while avoiding violations ofsafety constraints. However, few studies have addressed the non-stationarydisturbances in the environments, which may cause catastrophic outcomes. Inthis paper, we propose the context-aware safe reinforcement learning (CASRL)method, a meta-learning framework to realize safe adaptation in non-stationaryenvironments. We use a probabilistic latent variable model to achieve fastinference of the posterior environment transition distribution given thecontext data. Safety constraints are then evaluated with uncertainty-awaretrajectory sampling. The high cost of safety violations leads to the rarenessof unsafe records in the dataset. We address this issue by enabling prioritizedsampling during model training and formulating prior safety constraints withdomain knowledge during constrained planning. The algorithm is evaluated inrealistic safety-critical environments with non-stationary disturbances.Results show that the proposed algorithm significantly outperforms existingbaselines in terms of safety and robustness.