Abstract
The reinforcement learning (RL) and model predictive control (MPC)communities have developed vast ecosystems of theoretical approaches andcomputational tools for solving optimal control problems. Given theirconceptual similarities but differing strengths, there has been increasinginterest in synergizing RL and MPC. However, existing approaches tend to belimited for various reasons, including computational cost of MPC in an RLalgorithm and software hurdles towards seamless integration of MPC and RLtools. These challenges often result in the use of "simple" MPC schemes or RLalgorithms, neglecting the state-of-the-art in both areas. This paper presentsMPCritic, a machine learning-friendly architecture that interfaces seamlesslywith MPC tools. MPCritic utilizes the loss landscape defined by a parameterizedMPC problem, focusing on "soft" optimization over batched training steps;thereby updating the MPC parameters while avoiding costly minimization andparametric sensitivities. Since the MPC structure is preserved during training,an MPC agent can be readily used for online deployment, where robust constraintsatisfaction is paramount. We demonstrate the versatility of MPCritic, in termsof MPC architectures and RL algorithms that it can accommodate, on classiccontrol benchmarks.