Value-Based Reinforcement Learning for Continuous Control Robotic Manipulation in Multi-Task Sparse Reward Settings

Abstract

Learning continuous control in high-dimensional sparse reward settings, suchas robotic manipulation, is a challenging problem due to the number of samplesoften required to obtain accurate optimal value and policy estimates. Whilemany deep reinforcement learning methods have aimed at improving sampleefficiency through replay or improved exploration techniques, state of the artactor-critic and policy gradient methods still suffer from the hard explorationproblem in sparse reward settings. Motivated by recent successes of value-basedmethods for approximating state-action values, like RBF-DQN, we explore thepotential of value-based reinforcement learning for learning continuous roboticmanipulation tasks in multi-task sparse reward settings. On roboticmanipulation tasks, we empirically show RBF-DQN converges faster than currentstate of the art algorithms such as TD3, SAC, and PPO. We also perform ablationstudies with RBF-DQN and have shown that some enhancement techniques forvanilla Deep Q learning such as Hindsight Experience Replay (HER) andPrioritized Experience Replay (PER) can also be applied to RBF-DQN. Ourexperimental analysis suggests that value-based approaches may be moresensitive to data augmentation and replay buffer sample techniques thanpolicy-gradient methods, and that the benefits of these methods for robotmanipulation are heavily dependent on the transition dynamics of generatedsubgoal states.

Quick Read (beta)

loading the full paper ...