Safe Reinforcement Learning of Control-Affine Systems with Vertex Networks

Abstract

This paper focuses on finding reinforcement learning policies for controlsystems with hard state and action constraints. Despite its success in manydomains, reinforcement learning is challenging to apply to problems with hardconstraints, especially if both the state variables and actions areconstrained. Previous works seeking to ensure constraint satisfaction, orsafety, have focused on adding a projection step to a learned policy. Yet, thisapproach requires solving an optimization problem at every policy executionstep, which can lead to significant computational costs. To tackle this problem, this paper proposes a new approach, termed VertexNetworks (VNs), with guarantees on safety during exploration and on learnedcontrol policies by incorporating the safety constraints into the policynetwork architecture. Leveraging the geometric property that all points withina convex set can be represented as the convex combination of its vertices, theproposed algorithm first learns the convex combination weights and then usesthese weights along with the pre-calculated vertices to output an action. Theoutput action is guaranteed to be safe by construction. Numerical examplesillustrate that the proposed VN algorithm outperforms vanilla reinforcementlearning in a variety of benchmark control tasks.

Quick Read (beta)

loading the full paper ...