Intelligent Trainer for Model-Based Reinforcement Learning

Abstract

Model-based deep reinforcement learning (DRL) algorithm uses the sampled datafrom a real environment to learn the underlying system dynamics to construct anapproximate cyber environment. By using the synthesized data generated from thecyber environment to train the target controller, the training cost can bereduced significantly. In current research, issues such as the applicability ofapproximate model and the strategy to sample and train from the real and cyberenvironment have not been fully investigated. To address these issues, wepropose to utilize an intelligent trainer to properly use the approximate modeland control the sampling and training procedure in the model-based DRL. To doso, we package the training process of a model-based DRL as a standard RLenvironment, and design an RL trainer to control the training process. Thetrainer has three control actions: the first action controls where to sample inthe real and cyber environment; the second action determines how many datashould be sampled from the cyber environment and the third action controls howmany times the cyber data should be used to train the target controller. Theseactions would be controlled manually if without the trainer. The proposedframework is evaluated on five different tasks of OpenAI gym and the testresults show that the proposed trainer achieves significant better performancethan a fixed parameter model-based RL baseline algorithm. In addition, wecompare the performance of the intelligent trainer to a random trainer andprove that the intelligent trainer can indeed learn on the fly. The proposedtraining framework can be extended to more control actions with moresophisticated trainer design to further reduce the tweak cost of model-based RLalgorithms.

Quick Read (beta)

loading the full paper ...