Reinforcement Learning with Goal-Distance Gradient

  • 2020-01-10 12:26:33
  • Kai Jiang, XiaoLong Qin
Reinforcement learning usually uses the feedback rewards of environmental totrain agents. But the rewards in the actual environment are sparse, and evensome environments will not rewards. Most of the current methods are difficultto get good performance in sparse reward or non-reward environments. Althoughusing shaped rewards is effective when solving sparse reward tasks, it islimited to specific problems and learning is also susceptible to local optima.We propose a model-free method that does not rely on environmental rewards tosolve the problem of sparse rewards in the general environment. Our method usethe minimum number of transitions between states as the distance to replace therewards of environmental, and proposes a goal-distance gradient to achievepolicy improvement. We also introduce a bridge point planning method based onthe characteristics of our method to improve exploration efficiency, therebysolving more complex tasks. Experiments show that our method performs better onsparse reward and local optimal problems in complex environments than previouswork.


