Constrained Model-Free Reinforcement Learning for Process Optimization

Abstract

Reinforcement learning (RL) is a control approach that can handle nonlinearstochastic optimal control problems. However, despite the promise exhibited, RLhas yet to see marked translation to industrial practice primarily due to itsinability to satisfy state constraints. In this work we aim to address thischallenge. We propose an 'oracle'-assisted constrained Q-learning algorithmthat guarantees the satisfaction of joint chance constraints with a highprobability, which is crucial for safety critical tasks. To achieve this,constraint tightening (backoffs) are introduced and adjusted using Broyden'smethod, hence making them self-tuned. This results in a general methodologythat can be imbued into approximate dynamic programming-based algorithms toensure constraint satisfaction with high probability. Finally, we present casestudies that analyze the performance of the proposed approach and compare thisalgorithm with model predictive control (MPC). The favorable performance ofthis algorithm signifies a step toward the incorporation of RL into real worldoptimization and control of engineering systems, where constraints areessential in ensuring safety.

Quick Read (beta)

loading the full paper ...