On the Search for Feedback in Reinforcement Learning

Abstract

The problem of Reinforcement Learning (RL) in an unknown nonlinear dynamicalsystem is equivalent to the search for an optimal feedback law utilizing thesimulations/ rollouts of the unknown dynamical system. Most RL techniquessearch over a complex global nonlinear feedback parametrization making themsuffer from high training times as well as variance. Instead, we advocatesearching over a local feedback representation consisting of an open-loopsequence, and an associated optimal linear feedback law completely determinedby the open-loop. We show that this alternate approach results in highlyefficient training, the answers obtained are repeatable and hence reliable, andthe resulting closed performance is superior to global state of the art RLtechniques. Finally, if we replan, whenever required, which is feasible due tothe fast and reliable local solution, allows us to recover global optimality ofthe resulting feedback law.

Quick Read (beta)

loading the full paper ...