Safe Reinforcement Learning with Dual Robustness

Abstract

Reinforcement learning (RL) agents are vulnerable to adversarialdisturbances, which can deteriorate task performance or compromise safetyspecifications. Existing methods either address safety requirements under theassumption of no adversary (e.g., safe RL) or only focus on robustness againstperformance adversaries (e.g., robust RL). Learning one policy that is bothsafe and robust remains a challenging open problem. The difficulty is how totackle two intertwined aspects in the worst cases: feasibility and optimality.Optimality is only valid inside a feasible region, while identification ofmaximal feasible region must rely on learning the optimal policy. To addressthis issue, we propose a systematic framework to unify safe RL and robust RL,including problem formulation, iteration scheme, convergence analysis andpractical algorithm design. This unification is built upon constrainedtwo-player zero-sum Markov games. A dual policy iteration scheme is proposed,which simultaneously optimizes a task policy and a safety policy. Theconvergence of this iteration scheme is proved. Furthermore, we design a deepRL algorithm for practical implementation, called dually robust actor-critic(DRAC). The evaluations with safety-critical benchmarks demonstrate that DRACachieves high performance and persistent safety under all scenarios (noadversary, safety adversary, performance adversary), outperforming allbaselines significantly.

Quick Read (beta)

loading the full paper ...