Abstract
Offline reinforcement learning (RL) is a promising approach for many controlapplications but faces challenges such as limited data coverage and valuefunction overestimation. In this paper, we propose an implicit actor-critic(iAC) framework that employs optimization solution functions as a deterministicpolicy (actor) and a monotone function over the optimal value of optimizationas a critic. By encoding optimality in the actor policy, we show that thelearned policies are robust to the suboptimality of the learned actorparameters via the exponentially decaying sensitivity (EDS) property. We obtainperformance guarantees for the proposed iAC framework and show its benefitsover general function approximation schemes. Finally, we validate the proposedframework on two real-world applications and show a significant improvementover state-of-the-art (SOTA) offline RL methods.