Salience-Invariant Consistent Policy Learning for Generalization in Visual Reinforcement Learning

Abstract

Generalizing policies to unseen scenarios remains a critical challenge invisual reinforcement learning, where agents often overfit to the specificvisual observations of the training environment. In unseen environments,distracting pixels may lead agents to extract representations containingtask-irrelevant information. As a result, agents may deviate from the optimalbehaviors learned during training, thereby hindering visual generalization.Toaddress this issue, we propose the Salience-Invariant Consistent PolicyLearning (SCPL) algorithm, an efficient framework for zero-shot generalization.Our approach introduces a novel value consistency module alongside a dynamicsmodule to effectively capture task-relevant representations. The valueconsistency module, guided by saliency, ensures the agent focuses ontask-relevant pixels in both original and perturbed observations, while thedynamics module uses augmented data to help the encoder capture dynamic- andreward-relevant representations. Additionally, our theoretical analysishighlights the importance of policy consistency for generalization. Tostrengthen this, we introduce a policy consistency module with a KL divergenceconstraint to maintain consistent policies across original and perturbedobservations.Extensive experiments on the DMC-GB, Robotic Manipulation, andCARLA benchmarks demonstrate that SCPL significantly outperformsstate-of-the-art methods in terms of generalization. Notably, SCPL achievesaverage performance improvements of 14\%, 39\%, and 69\% in the challenging DMCvideo hard setting, the Robotic hard setting, and the CARLA benchmark,respectively.Project Page: https://sites.google.com/view/scpl-rl.

Quick Read (beta)

loading the full paper ...