Action Noise in Off-Policy Deep Reinforcement Learning: Impact on Exploration and Performance

Abstract

Many Deep Reinforcement Learning (D-RL) algorithms rely on simple forms ofexploration such as the additive action noise often used in continuous controldomains. Typically, the scaling factor of this action noise is chosen as ahyper-parameter and is kept constant during training. In this paper, we focuson action noise in off-policy deep reinforcement learning for continuouscontrol. We analyze how the learned policy is impacted by the noise type, noisescale, and impact scaling factor reduction schedule. We consider the two mostprominent types of action noise, Gaussian and Ornstein-Uhlenbeck noise, andperform a vast experimental campaign by systematically varying the noise typeand scale parameter, and by measuring variables of interest like the expectedreturn of the policy and the state-space coverage during exploration. For thelatter, we propose a novel state-space coverage measure$\operatorname{X}_{\mathcal{U}\text{rel}}$ that is more robust to boundaryartifacts than previously-proposed measures. Larger noise scales generallyincrease state-space coverage. However, we found that increasing the spacecoverage using a larger noise scale is often not beneficial. On the contrary,reducing the noise scale over the training process reduces the variance andgenerally improves the learning performance. We conclude that the best noisetype and scale are environment dependent, and based on our observations deriveheuristic rules for guiding the choice of the action noise as a starting pointfor further optimization.

Quick Read (beta)

loading the full paper ...