Bridging Model-based Safety and Model-free Reinforcement Learning through System Identification of Low Dimensional Linear Models

Abstract

Bridging model-based safety and model-free reinforcement learning (RL) fordynamic robots is appealing since model-based methods are able to provideformal safety guarantees, while RL-based methods are able to exploit the robotagility by learning from the full-order system dynamics. However, currentapproaches to tackle this problem are mostly restricted to simple systems. Inthis paper, we propose a new method to combine model-based safety withmodel-free reinforcement learning by explicitly finding a low-dimensional modelof the system controlled by a RL policy and applying stability and safetyguarantees on that simple model. We use a complex bipedal robot Cassie, whichis a high dimensional nonlinear system with hybrid dynamics and underactuation,and its RL-based walking controller as an example. We show that alow-dimensional dynamical model is sufficient to capture the dynamics of theclosed-loop system. We demonstrate that this model is linear, asymptoticallystable, and is decoupled across control input in all dimensions. We furtherexemplify that such linearity exists even when using different RL controlpolicies. Such results point out an interesting direction to understand therelationship between RL and optimal control: whether RL tends to linearize thenonlinear system during training in some cases. Furthermore, we illustrate thatthe found linear model is able to provide guarantees by safety-critical optimalcontrol framework, e.g., Model Predictive Control with Control BarrierFunctions, on an example of autonomous navigation using Cassie while takingadvantage of the agility provided by the RL-based controller.

Quick Read (beta)

loading the full paper ...