DEAR: Disentangled Environment and Agent Representations for Reinforcement Learning without Reconstruction

Abstract

Reinforcement Learning (RL) algorithms can learn robotic control tasks fromvisual observations, but they often require a large amount of data, especiallywhen the visual scene is complex and unstructured. In this paper, we explorehow the agent's knowledge of its shape can improve the sample efficiency ofvisual RL methods. We propose a novel method, Disentangled Environment andAgent Representations (DEAR), that uses the segmentation mask of the agent assupervision to learn disentangled representations of the environment and theagent through feature separation constraints. Unlike previous approaches, DEARdoes not require reconstruction of visual observations. These representationsare then used as an auxiliary loss to the RL objective, encouraging the agentto focus on the relevant features of the environment. We evaluate DEAR on twochallenging benchmarks: Distracting DeepMind control suite and Franka Kitchenmanipulation tasks. Our findings demonstrate that DEAR surpassesstate-of-the-art methods in sample efficiency, achieving comparable or superiorperformance with reduced parameters. Our results indicate that integratingagent knowledge into visual RL methods has the potential to enhance theirlearning efficiency and robustness.

Quick Read (beta)

loading the full paper ...