Learning by Playing - Solving Sparse Reward Tasks from Scratch

  • 2018-02-28 18:15:49
  • Martin Riedmiller, Roland Hafner, Thomas Lampe, Michael Neunert, Jonas Degrave, Tom Van de Wiele, Volodymyr Mnih, Nicolas Heess, Jost Tobias Springenberg
We propose Scheduled Auxiliary Control (SAC-X), a new learning paradigm inthe context of Reinforcement Learning (RL). SAC-X enables learning of complexbehaviors - from scratch - in the presence of multiple sparse reward signals.To this end, the agent is equipped with a set of general auxiliary tasks, thatit attempts to learn simultaneously via off-policy RL. The key idea behind ourmethod is that active (learned) scheduling and execution of auxiliary policiesallows the agent to efficiently explore its environment - enabling it to excelat sparse reward RL. Our experiments in several challenging roboticmanipulation settings demonstrate the power of our approach.


