Exploiting Estimation Bias in Clipped Double Q-Learning for Continous Control Reinforcement Learning Tasks

Abstract

Continuous control Deep Reinforcement Learning (RL) approaches are known tosuffer from estimation biases, leading to suboptimal policies. This paperintroduces innovative methods in RL, focusing on addressing and exploitingestimation biases in Actor-Critic methods for continuous control tasks, usingDeep Double Q-Learning. We design a Bias Exploiting (BE) mechanism todynamically select the most advantageous estimation bias during training of theRL agent. Most State-of-the-art Deep RL algorithms can be equipped with the BEmechanism, without hindering performance or computational complexity. Ourextensive experiments across various continuous control tasks demonstrate theeffectiveness of our approaches. We show that RL algorithms equipped with thismethod can match or surpass their counterparts, particularly in environmentswhere estimation biases significantly impact learning. The results underlinethe importance of bias exploitation in improving policy learning in RL.

Quick Read (beta)

loading the full paper ...