To the Max: Reinventing Reward in Reinforcement Learning

Abstract

In reinforcement learning (RL), different reward functions can define thesame optimal policy but result in drastically different learning performance.For some, the agent gets stuck with a suboptimal behavior, and for others, itsolves the task efficiently. Choosing a good reward function is hence anextremely important yet challenging problem. In this paper, we explore analternative approach for using rewards for learning. We introduce\textit{max-reward RL}, where an agent optimizes the maximum rather than thecumulative reward. Unlike earlier works, our approach works for deterministicand stochastic environments and can be easily combined with state-of-the-art RLalgorithms. In the experiments, we study the performance of max-reward RLalgorithms in two goal-reaching environments from Gymnasium-Robotics anddemonstrate its benefits over standard RL. The code is available athttps://github.com/veviurko/To-the-Max.

Quick Read (beta)

loading the full paper ...