Reinforcement Learning with Goal-Distance Gradient

  • 2020-01-01 02:37:34
  • Kai Jiang, XiaoLong Qin
  • 4


Reinforcement learning usually uses the feedback rewards of environmental totrain agents. But the rewards in the actual environment are sparse, and evensome environments will not rewards. Most of the current methods are difficultto get a good performance in a sparse reward environment. For environmentswithout feedback rewards, a reward must be artificially defined. We present amethod that does not rely on environmental rewards to solve the problem ofsparse rewards. At the same time, the above two problems are solved, and it canbe applied to more complicated environments and real-world environments. Weused the number of steps transferred between states as the distance to replacethe rewards of environmental. In order to solve the problem caused by the longdistance between the start and the goal in a more complicated environment, weadd bridge points to our method to establish a connection between the start andthe goal. Experiments show that our method can be applied to more environmentswhere distance cannot be estimated in advance.


