RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

Abstract

Offline reinforcement learning (RL) provides a promising direction to exploitthe massive amount of offline data for complex decision-making tasks. Due tothe distribution shift issue, current offline RL algorithms are generallydesigned to be conservative in value estimation and action selection. However,such conservatism can impair the robustness of learned policies whenencountering observation deviation under realistic conditions, such as sensorerrors and adversarial attacks. To trade off robustness and conservatism, wepropose Robust Offline Reinforcement Learning (RORL) with a novel conservativesmoothing technique. In RORL, we explicitly introduce regularization on thepolicy and the value function for states near the dataset, as well asadditional conservative value estimation on these OOD states. Theoretically, weshow RORL enjoys a tighter suboptimality bound than recent theoretical resultsin linear MDPs. We demonstrate that RORL can achieve state-of-the-artperformance on the general offline RL benchmark and is considerably robust toadversarial observation perturbations.

Quick Read (beta)

loading the full paper ...