Abstract
This work proposes an approach that integrates reinforcement learning andmodel predictive control (MPC) to efficiently solve finite-horizon optimalcontrol problems in mixed-logical dynamical systems. Optimization-based controlof such systems with discrete and continuous decision variables entails theonline solution of mixed-integer quadratic or linear programs, which sufferfrom the curse of dimensionality. Our approach aims at mitigating this issue byeffectively decoupling the decision on the discrete variables and the decisionon the continuous variables. Moreover, to mitigate the combinatorial growth inthe number of possible actions due to the prediction horizon, we conceive thedefinition of decoupled Q-functions to make the learning problem moretractable. The use of reinforcement learning reduces the online optimizationproblem of the MPC controller from a mixed-integer linear (quadratic) programto a linear (quadratic) program, greatly reducing the computational time.Simulation experiments for a microgrid, based on real-world data, demonstratethat the proposed method significantly reduces the online computation time ofthe MPC approach and that it generates policies with small optimality gaps andhigh feasibility rates.