Abstract
We address private deep offline reinforcement learning (RL), where the goalis to train a policy on standard control tasks that is differentially private(DP) with respect to individual trajectories in the dataset. To achieve this,we introduce PriMORL, a model-based RL algorithm with formal differentialprivacy guarantees. PriMORL first learns an ensemble of trajectory-level DPmodels of the environment from offline data. It then optimizes a policy on thepenalized private model, without any further interaction with the system oraccess to the dataset. In addition to offering strong theoretical foundations,we demonstrate empirically that PriMORL enables the training of private RLagents on offline continuous control tasks with deep function approximations,whereas current methods are limited to simpler tabular and linear MarkovDecision Processes (MDPs). We furthermore outline the trade-offs involved inachieving privacy in this setting.