Abstract
The problem of Reinforcement Learning (RL) in an unknown nonlinear dynamicalsystem is equivalent to the search for an optimal feedback law utilizing thesimulations/ rollouts of the unknown dynamical system. Most RL techniquessearch over a complex global nonlinear feedback parametrization making themsuffer from high training times as well as variance. Instead, we advocatesearching over a local feedback representation consisting of an open-loopsequence, and an associated optimal linear feedback law completely determinedby the open-loop. We show that this alternate approach results in highlyefficient training, the answers obtained are repeatable and hence reliable, andthe resulting closed performance is superior to global state-of-the-art RLtechniques. Finally, if we replan, whenever required, which is feasible due tothe fast and reliable local solution, allows us to recover global optimality ofthe resulting feedback law.