Unsupervised Visual Attention and Invariance for Reinforcement Learning

Abstract

The vision-based reinforcement learning (RL) has achieved tremendous success.However, generalizing vision-based RL policy to unknown test environments stillremains as a challenging problem. Unlike previous works that focus on traininga universal RL policy that is invariant to discrepancies between test andtraining environment, we focus on developing an independent module to disperseinterference factors irrelevant to the task, thereby providing "clean"observations for the RL policy. The proposed unsupervised visual attention and invariance method (VAI)contains three key components: 1) an unsupervised keypoint detection modelwhich captures semantically meaningful keypoints in observations; 2) anunsupervised visual attention module which automatically generates thedistraction-invariant attention mask for each observation; 3) a self-supervisedadapter for visual distraction invariance which reconstructsdistraction-invariant attention mask from observations with artificialdisturbances generated by a series of foreground and background augmentations.All components are optimized in an unsupervised way, without manual annotationor access to environment internals, and only the adapter is used duringinference time to provide distraction-free observations to RL policy. VAI empirically shows powerful generalization capabilities and significantlyoutperforms current state-of-the-art (SOTA) method by 15% to 49% in DeepMindControl suite benchmark and 61% to 229% in our proposed robot manipulationbenchmark, in term of cumulative rewards per episode.

Quick Read (beta)

loading the full paper ...