Highly Efficient Self-Adaptive Reward Shaping for Reinforcement Learning

Abstract

Reward shaping addresses the challenge of sparse rewards in reinforcementlearning by constructing denser and more informative reward signals. To achieveself-adaptive and highly efficient reward shaping, we propose a novel methodthat incorporates success rates derived from historical experiences into shapedrewards. Our approach utilizes success rates sampled from Beta distributions,which dynamically evolve from uncertain to reliable values as more data iscollected. Initially, the self-adaptive success rates exhibit more randomnessto encourage exploration. Over time, they become more certain to enhanceexploitation, thus achieving a better balance between exploration andexploitation. We employ Kernel Density Estimation (KDE) combined with RandomFourier Features (RFF) to derive the Beta distributions, resulting in acomputationally efficient implementation in high-dimensional continuous statespaces. This method provides a non-parametric and learning-free approach. Theproposed method is evaluated on a wide range of continuous control tasks withsparse and delayed rewards, demonstrating significant improvements in sampleefficiency and convergence stability compared to relevant baselines.

Quick Read (beta)

loading the full paper ...