Learning Routines for Effective Off-Policy Reinforcement Learning

  • 2021-06-05 18:41:57
  • Edoardo Cetin, Oya Celiktutan
  • 0


The performance of reinforcement learning depends upon designing anappropriate action space, where the effect of each action is measurable, yet,granular enough to permit flexible behavior. So far, this process involvednon-trivial user choices in terms of the available actions and their executionfrequency. We propose a novel framework for reinforcement learning thateffectively lifts such constraints. Within our framework, agents learneffective behavior over a routine space: a new, higher-level action space,where each routine represents a set of 'equivalent' sequences of granularactions with arbitrary length. Our routine space is learned end-to-end tofacilitate the accomplishment of underlying off-policy reinforcement learningobjectives. We apply our framework to two state-of-the-art off-policyalgorithms and show that the resulting agents obtain relevant performanceimprovements while requiring fewer interactions with the environment perepisode, improving computational efficiency.


Quick Read (beta)

loading the full paper ...