Reinforcement Learning without Ground-Truth State

Abstract

To perform robot manipulation tasks, a low-dimensional state of theenvironment typically needs to be estimated. However, designing a stateestimator can sometimes be difficult, especially in environments withdeformable objects. An alternative is to learn an end-to-end policy that mapsdirectly from high-dimensional sensor inputs to actions. However, if thispolicy is trained with reinforcement learning, then without a state estimator,it is hard to specify a reward function based on high-dimensional observations.To meet this challenge, we propose a simple indicator reward function forgoal-conditioned reinforcement learning: we only give a positive reward whenthe robot's observation exactly matches a target goal observation. We show thatby relabeling the original goal with the achieved goal to obtain positiverewards (Andrychowicz et al., 2017), we can learn with the indicator rewardfunction even in continuous state spaces. We propose two methods to furtherspeed up convergence with indicator rewards: reward balancing and rewardfiltering. We show comparable performance between our method and an oraclewhich uses the ground-truth state for computing rewards. We show that ourmethod can perform complex tasks in continuous state spaces such as ropemanipulation from RGB-D images, without knowledge of the ground-truth state.

Quick Read (beta)

loading the full paper ...