Computationally efficient Gauss-Newton reinforcement learning for model predictive control

Abstract

Model predictive control (MPC) is widely used in process control due to itsinterpretability and ability to handle constraints. As a parametric policy inreinforcement learning (RL), MPC offers strong initial performance and low datarequirements compared to black-box policies like neural networks. However, mostRL methods rely on first-order updates, which scale well to large parameterspaces but converge at most linearly, making them inefficient when each policyupdate requires solving an optimal control problem, as is the case with MPC.While MPC policies are typically sparsely parameterized and thus amenable tosecond-order approaches, existing second-order methods demand second-orderpolicy derivatives, which can be computationally and memory-wise intractable. This work introduces a Gauss-Newton approximation of the deterministic policyHessian that eliminates the need for second-order policy derivatives, enablingsuperlinear convergence with minimal computational overhead. To further improverobustness, we propose a momentum-based Hessian averaging scheme for stabletraining under noisy estimates. We demonstrate the effectiveness of theapproach on a nonlinear continuously stirred tank reactor (CSTR), showingfaster convergence and improved data efficiency over state-of-the-artfirst-order methods.

Quick Read (beta)

loading the full paper ...