A note on stabilizing reinforcement learning

Abstract

Reinforcement learning is a general methodology of adaptive optimal controlthat has attracted much attention in various fields ranging from video gameindustry to robot manipulators. Despite its remarkable performancedemonstrations, plain reinforcement learning controllers do not guaranteestability which compromises their applicability in industry. To provide suchguarantees, measures have to be taken. This gives rise to what could generallybe called stabilizing reinforcement learning. Concrete approaches range fromemployment of human overseers to filter out unsafe actions to formally verifiedshields and fusion with classical stabilizing controllers. A line of attackthat utilizes elements of adaptive control has become fairly popular in therecent years. In this note, we critically address such an approach in a fairlygeneral actor-critic setup for nonlinear time-continuous environments. Theactor network utilizes a so-called robustifying term that is supposed tocompensate for the neural network errors. The corresponding stability analysisis based on the value function itself. We indicate a problem in such astability analysis and provide a counterexample to the overall control scheme.Implications for such a line of attack in stabilizing reinforcement learningare discussed. Furthermore, unfortunately the said problem possess no fixwithout a substantial reconsideration of the whole approach. As a positivemessage, we derive a stochastic critic neural network weight convergenceanalysis provided that the environment was stabilized.

Quick Read (beta)

loading the full paper ...