A framework for online, stabilizing reinforcement learning

Abstract

Online reinforcement learning is concerned with training an agent on-the-flyvia dynamic interaction with the environment. Here, due to the specifics of theapplication, it is not generally possible to perform long pre-training, as itis commonly done in off-line, model-free approaches, which are akin to dynamicprogramming. Such applications may be found more frequently in industry, ratherthan in pure digital fields, such as cloud services, video games, databasemanagement, etc., where reinforcement learning has been demonstrating success.Online reinforcement learning, in contrast, is more akin to classical control,which utilizes some model knowledge about the environment. Stability of theclosed-loop (agent plus the environment) is a major challenge for such onlineapproaches. In this paper, we tackle this problem by a special fusion of onlinereinforcement learning with elements of classical control, namely, based on theLyapunov theory of stability. The idea is to start the agent at once, withoutpre-training, and learn approximately optimal policy under specially designedconstraints, which guarantee stability. The resulting approach was tested in anextensive experimental study with a mobile robot. A nominal parking controllerwas used as a baseline. It was observed that the suggested agent could alwayssuccessfully park the robot, while significantly improving the cost. While manyapproaches may be exploited for mobile robot control, we suggest that theexperiments showed the promising potential of online reinforcement learningagents based on Lyapunov-like constraints. The presented methodology may beutilized in safety-critical, industrial applications where stability isnecessary.

Quick Read (beta)

loading the full paper ...