Projection Implicit Q-Learning with Support Constraint for Offline Reinforcement Learning

Abstract

Offline Reinforcement Learning (RL) faces a critical challenge ofextrapolation errors caused by out-of-distribution (OOD) actions. ImplicitQ-Learning (IQL) algorithm employs expectile regression to achieve in-samplelearning, effectively mitigating the risks associated with OOD actions.However, the fixed hyperparameter in policy evaluation and density-based policyimprovement method limit its overall efficiency. In this paper, we proposeProj-IQL, a projective IQL algorithm enhanced with the support constraint. Inthe policy evaluation phase, Proj-IQL generalizes the one-step approach to amulti-step approach through vector projection, while maintaining in-samplelearning and expectile regression framework. In the policy improvement phase,Proj-IQL introduces support constraint that is more aligned with the policyevaluation approach. Furthermore, we theoretically demonstrate that Proj-IQLguarantees monotonic policy improvement and enjoys a progressively morerigorous criterion for superior actions. Empirical results demonstrate theProj-IQL achieves state-of-the-art performance on D4RL benchmarks, especiallyin challenging navigation domains.

Quick Read (beta)

loading the full paper ...