Bootstrapped Reward Shaping

  • 2025-01-02 00:40:55
  • Jacob Adamczyk, Volodymyr Makarenko, Stas Tiomkin, Rahul V. Kulkarni
  • 0

Abstract

In reinforcement learning, especially in sparse-reward domains, manyenvironment steps are required to observe reward information. In order toincrease the frequency of such observations, "potential-based reward shaping"(PBRS) has been proposed as a method of providing a more dense reward signalwhile leaving the optimal policy invariant. However, the required "potentialfunction" must be carefully designed with task-dependent knowledge to not detertraining performance. In this work, we propose a "bootstrapped" method ofreward shaping, termed BSRS, in which the agent's current estimate of thestate-value function acts as the potential function for PBRS. We provideconvergence proofs for the tabular setting, give insights into trainingdynamics for deep RL, and show that the proposed method improves training speedin the Atari suite.

 

Quick Read (beta)

loading the full paper ...