Hyperbolically-Discounted Reinforcement Learning on Reward-Punishment Framework

Abstract

This paper proposes a new reinforcement learning with hyperbolic discounting.Combining a new temporal difference error with the hyperbolic discounting inrecursive manner and reward-punishment framework, a new scheme to learn theoptimal policy is derived. In simulations, it is found that the proposaloutperforms the standard reinforcement learning, although the performancedepends on the design of reward and punishment. In addition, the averages ofdiscount factors w.r.t. reward and punishment are different from each other,like a sign effect in animal behaviors.

Quick Read (beta)

loading the full paper ...