Learning Routines for Effective Off-Policy Reinforcement Learning

Abstract

The performance of reinforcement learning depends upon designing anappropriate action space, where the effect of each action is measurable, yet,granular enough to permit flexible behavior. So far, this process involvednon-trivial user choices in terms of the available actions and their executionfrequency. We propose a novel framework for reinforcement learning thateffectively lifts such constraints. Within our framework, agents learneffective behavior over a routine space: a new, higher-level action space,where each routine represents a set of 'equivalent' sequences of granularactions with arbitrary length. Our routine space is learned end-to-end tofacilitate the accomplishment of underlying off-policy reinforcement learningobjectives. We apply our framework to two state-of-the-art off-policyalgorithms and show that the resulting agents obtain relevant performanceimprovements while requiring fewer interactions with the environment perepisode, improving computational efficiency.

Quick Read (beta)

loading the full paper ...