Intelligent Trainer for Model-Based Reinforcement Learning

Abstract

Model-based reinforcement learning (MBRL) has been proposed as a promisingalternative solution to tackle the high sampling cost challenge in thecanonical reinforcement learning (RL), by leveraging a learned model togenerate synthesized data for policy training purpose. The MBRL framework,nevertheless, is inherently limited by the convoluted process of jointlylearning control policy and configuring hyper-parameters (e.g., global/localmodels, real and synthesized data, etc). The training process could be tediousand prohibitively costly. In this research, we propose an "reinforcement onreinforcement" (RoR) architecture to decompose the convoluted tasks into twolayers of reinforcement learning. The inner layer is the canonical model-basedRL training process environment (TPE), which learns the control policy for theunderlying system and exposes interfaces to access states, actions and rewards.The outer layer presents an RL agent, called as AI trainer, to learn an optimalhyper-parameter configuration for the inner TPE. This decomposition approachprovides a desirable flexibility to implement different trainer designs, calledas "train the trainer". In our research, we propose and optimize twoalternative trainer designs: 1) a uni-head trainer and 2) a multi-head trainer.Our proposed RoR framework is evaluated for five tasks in the OpenAI gym (i.e.,Pendulum, Mountain Car, Reacher, Half Cheetah and Swimmer). Compared to threeother baseline algorithms, our proposed Train-the-Trainer algorithm has acompetitive performance in auto-tuning capability, with upto 56% expectedsampling cost saving without knowing the best parameter setting in advance. Theproposed trainer framework can be easily extended to other cases in which thehyper-parameter tuning is costly.

Quick Read (beta)

loading the full paper ...