Stepping Out of the Shadows: Reinforcement Learning in Shadow Mode

Abstract

Reinforcement learning (RL) is not yet competitive for many cyber-physicalsystems, such as robotics, process automation, and power systems, as trainingon a system with physical components cannot be accelerated, and simulationmodels do not exist or suffer from a large simulation-to-reality gap. Duringthe long training time, expensive equipment cannot be used and might even bedamaged due to inappropriate actions of the reinforcement learning agent. Ournovel approach addresses exactly this problem: We train the reinforcement agentin a so-called shadow mode with the assistance of an existing conventionalcontroller, which does not have to be trained and instantaneously performsreasonably well. In shadow mode, the agent relies on the controller to provideaction samples and guidance towards favourable states to learn the task, whilesimultaneously estimating for which states the learned agent will receive ahigher reward than the conventional controller. The RL agent will then controlthe system for these states and all other regions remain under the control ofthe existing controller. Over time, the RL agent will take over for anincreasing amount of states, while leaving control to the baseline, where itcannot surpass its performance. Thus, we keep regret during training low andimprove the performance compared to only using conventional controllers orreinforcement learning. We present and evaluate two mechanisms for decidingwhether to use the RL agent or the conventional controller. The usefulness ofour approach is demonstrated for a reach-avoid task, for which we are able toeffectively train an agent, where standard approaches fail.

Quick Read (beta)

loading the full paper ...