Model-Free Adaptive Optimal Control of Sequential Manufacturing Processes using Reinforcement Learning

Abstract

A self-learning optimal control algorithm for sequential manufacturingprocesses with time-discrete control actions is proposed and evaluated on asimulated deep drawing process. The control model is built during consecutiveprocess executions under optimal control via Reinforcement Learning, using themeasured product quality as reward after each process execution. Prior modelformulation, which is required by state-of-the-art algorithms like ModelPredictive Control and Approximate Dynamic Programming, is therefore obsolete.This avoids several difficulties namely in system identification, accuratemodelling, and runtime complexity, that arise when dealing with processessubject to nonlinear dynamics and stochastic influences. Instead of usingpre-created process and observation models, value function-based ReinforcementLearning algorithms build functions of expected future reward, which are usedto derive optimal process control decisions. The expectation functions arelearned online, by interacting with the process. The proposed algorithm takesstochastic variations of the process conditions into account and is able tocope with partial observability. A Q-learning-based method for adaptive optimalcontrol of partially observable fixed-horizon manufacturing processes isdeveloped and studied. The resulting algorithm is instantiated and evaluated byapplying it to a simulated stochastic optimal control problem in metal sheetdeep drawing.

Quick Read (beta)

loading the full paper ...