On stabilizing reinforcement learning without Lyapunov functions

Abstract

Reinforcement learning remains one of the major directions of thecontemporary development of control engineering and machine learning. Niceintuition, flexible settings, ease of application are among the many perks ofthis methodology. From the standpoint of machine learning, the main strength ofa reinforcement learning agent is its ability to "capture" (learn) the optimalbehavior in the given environment. Typically, the agent is built on neuralnetworks and it is their approximation abilities that give rise to the abovebelief. From the standpoint of control engineering, however, reinforcementlearning has serious deficiencies. The most significant one is the lack ofstability guarantee of the agent-environment closed loop. A great deal ofresearch was and is being made towards stabilizing reinforcement learning.Speaking of stability, the celebrated Lyapunov theory is the de facto tool. Itis thus no wonder that so many techniques of stabilizing reinforcement learningrely on the Lyapunov theory in one way or another. In control theory, there isan intricate connection between a stabilizing controller and a Lyapunovfunction. Employing such a pair seems thus quite attractive to designstabilizing reinforcement learning. However, computation of a Lyapunov functionis generally a cumbersome process. In this note, we show how to construct astabilizing reinforcement learning agent that does not employ such a functionat all. We only assume that a Lyapunov function exists, which is a naturalthing to do if the given system (read: environment) is stabilizable, but we donot need to compute one.

Quick Read (beta)

loading the full paper ...