Dynamics Generalization via Information Bottleneck in Deep Reinforcement Learning

Abstract

Despite the significant progress of deep reinforcement learning (RL) insolving sequential decision making problems, RL agents often overfit totraining environments and struggle to adapt to new, unseen environments. Thisprevents robust applications of RL in real world situations, where systemdynamics may deviate wildly from the training settings. In this work, ourprimary contribution is to propose an information theoretic regularizationobjective and an annealing-based optimization method to achieve bettergeneralization ability in RL agents. We demonstrate the extreme generalizationbenefits of our approach in different domains ranging from maze navigation torobotic tasks; for the first time, we show that agents can generalize to testparameters more than 10 standard deviations away from the training parameterdistribution. This work provides a principled way to improve generalization inRL by gradually removing information that is redundant for task-solving; itopens doors for the systematic study of generalization from training toextremely different testing settings, focusing on the established connectionsbetween information theory and machine learning.

Quick Read (beta)

loading the full paper ...