A Deep Reinforcement Learning Architecture for Multi-stage Optimal Control

Abstract

Deep reinforcement learning for high dimensional, hierarchical control tasksusually requires the use of complex neural networks as functionalapproximators, which can lead to inefficiency, instability and even divergencein the training process. Here, we introduce stacked deep Q learning (SDQL), aflexible modularized deep reinforcement learning architecture, that can enablefinding of optimal control policy of control tasks consisting of multiplelinear stages in a stable and efficient way. SDQL exploits the linear stagestructure by approximating the Q function via a collection of deep Qsub-networks stacking along an axis marking the stage-wise progress of thewhole task. By back-propagating the learned state values from later stages toearlier stages, all sub-networks co-adapt to maximize the total reward of thewhole task, although each sub-network is responsible for learning optimalcontrol policy for its own stage. This modularized architecture offersconsiderable flexibility in terms of environment and policy modeling, as itallows choices of different state spaces, action spaces, reward structures, andQ networks for each stage, Further, the backward stage-wise training procedureof SDQL can offers additional transparency, stability, and flexibility to thetraining process, thus facilitating model fine-tuning and hyper-parametersearch. We demonstrate that SDQL is capable of learning competitive strategiesfor problems with characteristics of high-dimensional state space,heterogeneous action space(both discrete and continuous), multiple scales, andsparse and delayed rewards.

Quick Read (beta)

loading the full paper ...