To the Max: Reinventing Reward in Reinforcement Learning

  • 2024-07-29 19:07:08
  • Grigorii Veviurko, Wendelin Böhmer, Mathijs de Weerdt
  • 0

Abstract

In reinforcement learning (RL), different reward functions can define thesame optimal policy but result in drastically different learning performance.For some, the agent gets stuck with a suboptimal behavior, and for others, itsolves the task efficiently. Choosing a good reward function is hence anextremely important yet challenging problem. In this paper, we explore analternative approach for using rewards for learning. We introduce\textit{max-reward RL}, where an agent optimizes the maximum rather than thecumulative reward. Unlike earlier works, our approach works for deterministicand stochastic environments and can be easily combined with state-of-the-art RLalgorithms. In the experiments, we study the performance of max-reward RLalgorithms in two goal-reaching environments from Gymnasium-Robotics anddemonstrate its benefits over standard RL. The code is available athttps://github.com/veviurko/To-the-Max.

 

Quick Read (beta)

loading the full paper ...