Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning

Abstract

Reinforcement learning (RL) methods usually treat reward functions as blackboxes. As such, these methods must extensively interact with the environment inorder to discover rewards and optimal policies. In most RL applications,however, users have to program the reward function and, hence, there is theopportunity to make the reward function visible -- to show the rewardfunction's code to the RL agent so it can exploit the function's internalstructure to learn optimal policies in a more sample efficient manner. In thispaper, we show how to accomplish this idea in two steps. First, we proposereward machines, a type of finite state machine that supports the specificationof reward functions while exposing reward function structure. We then describedifferent methodologies to exploit this structure to support learning,including automated reward shaping, task decomposition, and counterfactualreasoning with off-policy learning. Experiments on tabular and continuousdomains, across different tasks and RL agents, show the benefits of exploitingreward structure with respect to sample efficiency and the quality of resultantpolicies. Finally, by virtue of being a form of finite state machine, rewardmachines have the expressive power of a regular language and as such supportloops, sequences and conditionals, as well as the expression of temporallyextended properties typical of linear temporal logic and non-Markovian rewardspecification.

Quick Read (beta)

loading the full paper ...