Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning

  • 2020-10-06 00:10:16
  • Rodrigo Toro Icarte, Toryn Q. Klassen, Richard Valenzano, Sheila A. McIlraith
Reinforcement learning (RL) methods usually treat reward functions as blackboxes. As such, these methods must extensively interact with the environment inorder to discover rewards and optimal policies. In most RL applications,however, users have to program the reward function and, hence, there is theopportunity to treat reward functions as white boxes instead -- to show thereward function's code to the RL agent so it can exploit its internalstructures to learn optimal policies faster. In this paper, we show how toaccomplish this idea in two steps. First, we propose reward machines (RMs), atype of finite state machine that supports the specification of rewardfunctions while exposing reward function structure. We then describe differentmethodologies to exploit such structures, including automated reward shaping,task decomposition, and counterfactual reasoning for data augmentation.Experiments on tabular and continuous domains show the benefits of exploitingreward structure across different tasks and RL agents.


