Abstract
Reinforcement learning has shown great potential in solving complex taskswhen large amounts of data can be generated with little effort. In robotics,one approach to generate training data builds on simulations based on dynamicsmodels derived from first principles. However, for tasks that, for instance,involve complex soft robots, devising such models is substantially morechallenging. Being able to train effectively in increasingly complicatedscenarios with reinforcement learning enables to take advantage of complexsystems such as soft robots. Here, we leverage the imbalance in complexity ofthe dynamics to learn more sample-efficiently. We (i) abstract the task intodistinct components, (ii) off-load the simple dynamics parts into thesimulation, and (iii) multiply these virtual parts to generate more data inhindsight. Our new method, Hindsight States (HiS), uses this data and selectsthe most useful transitions for training. It can be used with an arbitraryoff-policy algorithm. We validate our method on several challenging simulatedtasks and demonstrate that it improves learning both alone and when combinedwith an existing hindsight algorithm, Hindsight Experience Replay (HER).Finally, we evaluate HiS on a physical system and show that it boostsperformance on a complex table tennis task with a muscular robot. Videos andcode of the experiments can be found on webdav.tuebingen.mpg.de/his/.