Shaping Sparse Rewards in Reinforcement Learning: A Semi-supervised Approach

Abstract

In many real-world scenarios, reward signal for agents are exceedinglysparse, making it challenging to learn an effective reward function for rewardshaping. To address this issue, the proposed approach in this paper performsreward shaping not only by utilizing non-zero-reward transitions but also byemploying the \emph{Semi-Supervised Learning} (SSL) technique combined with anovel data augmentation to learn trajectory space representations from themajority of transitions, {i.e}., zero-reward transitions, thereby improving theefficacy of reward shaping. Experimental results in Atari and roboticmanipulation demonstrate that our method outperforms supervised-basedapproaches in reward inference, leading to higher agent scores. Notably, inmore sparse-reward environments, our method achieves up to twice the peakscores compared to supervised baselines. The proposed double entropy dataaugmentation enhances performance, showcasing a 15.8\% increase in best scoreover other augmentation methods

Quick Read (beta)

loading the full paper ...