Adapting to Reward Progressivity via Spectral Reinforcement Learning

Abstract

In this paper we consider reinforcement learning tasks with progressiverewards; that is, tasks where the rewards tend to increase in magnitude overtime. We hypothesise that this property may be problematic for value-based deepreinforcement learning agents, particularly if the agent must first succeed inrelatively unrewarding regions of the task in order to reach more rewardingregions. To address this issue, we propose Spectral DQN, which decomposes thereward into frequencies such that the high frequencies only activate when largerewards are found. This allows the training loss to be balanced so that itgives more even weighting across small and large reward regions. In two domainswith extreme reward progressivity, where standard value-based methods strugglesignificantly, Spectral DQN is able to make much farther progress. Moreover,when evaluated on a set of six standard Atari games that do not overtly favourthe approach, Spectral DQN remains more than competitive: While itunderperforms one of the benchmarks in a single game, it comfortably surpassesthe benchmarks in three games. These results demonstrate that the approach isnot overfit to its target problem, and suggest that Spectral DQN may haveadvantages beyond addressing reward progressivity.

Quick Read (beta)

loading the full paper ...