Abstract
Training agents to autonomously learn how to use anthropomorphic robotichands has the potential to lead to systems capable of performing a multitude ofcomplex manipulation tasks in unstructured and uncertain environments. In thiswork, we first introduce a suite of challenging simulated manipulation tasksthat current reinforcement learning and trajectory optimisation techniques finddifficult. These include environments where two simulated hands have to pass orthrow objects between each other, as well as an environment where the agentmust learn to spin a long pen between its fingers. We then introduce a simpletrajectory optimisation that performs significantly better than existingmethods on these environments. Finally, on the challenging PenSpin task wecombine sub-optimal demonstrations generated through trajectory optimisationwith off-policy reinforcement learning, obtaining performance that far exceedseither of these approaches individually, effectively solving the environment.