Abstract
In constrained reinforcement learning (RL), a learning agent seeks to notonly optimize the overall reward but also satisfy the additional safety,diversity, or budget constraints. Consequently, existing constrained RLsolutions require several new algorithmic ingredients that are notablydifferent from standard RL. On the other hand, reward-free RL is independentlydeveloped in the unconstrained literature, which learns the transition dynamicswithout using the reward information, and thus naturally capable of addressingRL with multiple objectives under the common dynamics. This paper bridgesreward-free RL and constrained RL. Particularly, we propose a simplemeta-algorithm such that given any reward-free RL oracle, the approachabilityand constrained RL problems can be directly solved with negligible overheads insample complexity. Utilizing the existing reward-free RL solvers, our frameworkprovides sharp sample complexity results for constrained RL in the tabular MDPsetting, matching the best existing results up to a factor of horizondependence; our framework directly extends to a setting of tabular two-playerMarkov games, and gives a new result for constrained RL with linear functionapproximation.