RLOC: Neurobiologically Inspired Hierarchical Reinforcement Learning Algorithm for Continuous Control of Nonlinear Dynamical Systems

  • 2019-03-07 17:37:53
  • Ekaterina Abramova, Luke Dickens, Daniel Kuhn, Aldo Faisal
  • 0


Nonlinear optimal control problems are often solved with numerical methodsthat require knowledge of system's dynamics which may be difficult to infer,and that carry a large computational cost associated with iterativecalculations. We present a novel neurobiologically inspired hierarchicallearning framework, Reinforcement Learning Optimal Control, which operates ontwo levels of abstraction and utilises a reduced number of controllers to solvenonlinear systems with unknown dynamics in continuous state and action spaces.Our approach is inspired by research at two levels of abstraction: first, atthe level of limb coordination human behaviour is explained by linear optimalfeedback control theory. Second, in cognitive tasks involving learning symboliclevel action selection, humans learn such problems using model-free andmodel-based reinforcement learning algorithms. We propose that combining thesetwo levels of abstraction leads to a fast global solution of nonlinear controlproblems using reduced number of controllers. Our framework learns the localtask dynamics from naive experience and forms locally optimal infinite horizonLinear Quadratic Regulators which produce continuous low-level control. Atop-level reinforcement learner uses the controllers as actions and learns howto best combine them in state space while maximising a long-term reward. Asingle optimal control objective function drives high-level symbolic learningby providing training signals on desirability of each selected controller. Weshow that a small number of locally optimal linear controllers are able tosolve global nonlinear control problems with unknown dynamics when combinedwith a reinforcement learner in this hierarchical framework. Our algorithmcompetes in terms of computational cost and solution quality with sophisticatedcontrol algorithms and we illustrate this with solutions to benchmark problems.


Introduction (beta)



Conclusion (beta)