DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty

Abstract

Deep reinforcement learning (RL) has achieved significant success, yet itsapplication in real-world scenarios is often hindered by a lack of robustnessto environmental uncertainties. To solve this challenge, some robust RLalgorithms have been proposed, but most are limited to tabular settings. Inthis work, we propose Distributionally Robust Soft Actor-Critic (DR-SAC), anovel algorithm designed to enhance the robustness of the state-of-the-art SoftActor-Critic (SAC) algorithm. DR-SAC aims to maximize the expected value withentropy against the worst possible transition model lying in an uncertaintyset. A distributionally robust version of the soft policy iteration is derivedwith a convergence guarantee. For settings where nominal distributions areunknown, such as offline RL, a generative modeling approach is proposed toestimate the required nominal distributions from data. Furthermore,experimental results on a range of continuous control benchmark tasksdemonstrate our algorithm achieves up to $9.8$ times the average reward of theSAC baseline under common perturbations. Additionally, compared with existingrobust reinforcement learning algorithms, DR-SAC significantly improvescomputing efficiency and applicability to large-scale problems.

Quick Read (beta)

loading the full paper ...