Self-Supervised Policy Adaptation during Deployment

Abstract

In most real world scenarios, a policy trained by reinforcement learning inone environment needs to be deployed in another, potentially quite differentenvironment. However, generalization across different environments is known tobe hard. A natural solution would be to keep training after deployment in thenew environment, but this cannot be done if the new environment offers noreward signal. Our work explores the use of self-supervision to allow thepolicy to continue training after deployment without using any rewards. Whileprevious methods explicitly anticipate changes in the new environment, weassume no prior knowledge of those changes yet still obtain significantimprovements. Empirical evaluations are performed on diverse environments fromDeepMind Control suite and ViZDoom. Our method improves generalization in 25out of 30 environments across various tasks, and outperforms domainrandomization on a majority of environments.

Quick Read (beta)

loading the full paper ...