A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Low-Rank MDPs

Abstract

Offline reinforcement learning (RL) aims to learn a policy that maximizes theexpected cumulative reward using a pre-collected dataset. Offline RL withlow-rank MDPs or general function approximation has been widely studiedrecently, but existing algorithms with sample complexity $O(\epsilon^{-2})$ forfinding an $\epsilon$-optimal policy either require a uniform data coverageassumptions or are computationally inefficient. In this paper, we propose aprimal dual algorithm for offline RL with low-rank MDPs in the discountedinfinite-horizon setting. Our algorithm is the first computationally efficientalgorithm in this setting that achieves sample complexity of $O(\epsilon^{-2})$with partial data coverage assumption. This improves upon a recent work thatrequires $O(\epsilon^{-4})$ samples. Moreover, our algorithm extends theprevious work to the offline constrained RL setting by supporting constraintson additional reward signals.

Quick Read (beta)

loading the full paper ...