Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning

Abstract

Reinforcement Learning (RL) agents are often unable to generalise well toenvironment variations in the state space that were not observed duringtraining. This issue is especially problematic for image-based RL, where achange in just one variable, such as the background colour, can change manypixels in the image, which can lead to drastic changes in the agent's latentrepresentation of the image, causing the learned policy to fail. To learn morerobust representations, we introduce TEmporal Disentanglement (TED), aself-supervised auxiliary task that leads to disentangled image representationsexploiting the sequential nature of RL observations. We find empirically thatRL algorithms utilising TED as an auxiliary task adapt more quickly to changesin environment variables with continued training compared to state-of-the-artrepresentation learning methods. Since TED enforces a disentangled structure ofthe representation, we also find that policies trained with TED generalisebetter to unseen values of variables irrelevant to the task (e.g. backgroundcolour) as well as unseen values of variables that affect the optimal policy(e.g. goal positions).

Quick Read (beta)

loading the full paper ...