Abstract
Reinforcement learning has shown great promise in robotics thanks to itsability to develop efficient robotic control procedures through self-training.In particular, reinforcement learning has been successfully applied to solvingthe reaching task with robotic arms. In this paper, we define a robust,reproducible and systematic experimental procedure to compare the performanceof various model-free algorithms at solving this task. The policies are trainedin simulation and are then transferred to a physical robotic manipulator. It isshown that augmenting the reward signal with the Hindsight Experience Replayexploration technique increases the average return of off-policy agents between7 and 9 folds when the target position is initialised randomly at the beginningof each episode.