Constraint-Adaptive Policy Switching for Offline Safe Reinforcement Learning

Abstract

Offline safe reinforcement learning (OSRL) involves learning adecision-making policy to maximize rewards from a fixed batch of training datato satisfy pre-defined safety constraints. However, adapting to varying safetyconstraints during deployment without retraining remains an under-exploredchallenge. To address this challenge, we introduce constraint-adaptive policyswitching (CAPS), a wrapper framework around existing offline RL algorithms.During training, CAPS uses offline data to learn multiple policies with ashared representation that optimize different reward and cost trade-offs.During testing, CAPS switches between those policies by selecting at each statethe policy that maximizes future rewards among those that satisfy the currentcost constraint. Our experiments on 38 tasks from the DSRL benchmarkdemonstrate that CAPS consistently outperforms existing methods, establishing astrong wrapper-based baseline for OSRL. The code is publicly available athttps://github.com/yassineCh/CAPS.

Quick Read (beta)

loading the full paper ...