Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing

  • 2020-02-12 20:35:31
  • Ge Liu, Rui Wu, Heng-Tze Cheng, Jing Wang, Jayden Ooi, Lihong Li, Ang Li, Wai Lok Sibon Li, Craig Boutilier, Ed Chi
  • 4

Abstract

Deep Reinforcement Learning (RL) is proven powerful for decision making insimulated environments. However, training deep RL model is challenging in realworld applications such as production-scale health-care or recommender systemsbecause of the expensiveness of interaction and limitation of budget atdeployment. One aspect of the data inefficiency comes from the expensivehyper-parameter tuning when optimizing deep neural networks. We proposeAdaptive Behavior Policy Sharing (ABPS), a data-efficient training algorithmthat allows sharing of experience collected by behavior policy that isadaptively selected from a pool of agents trained with an ensemble ofhyper-parameters. We further extend ABPS to evolve hyper-parameters duringtraining by hybridizing ABPS with an adapted version of Population BasedTraining (ABPS-PBT). We conduct experiments with multiple Atari games with upto 16 hyper-parameter/architecture setups. ABPS achieves superior overallperformance, reduced variance on top 25% agents, and equivalent performance onthe best agent compared to conventional hyper-parameter tuning with independenttraining, even though ABPS only requires the same number of environmentalinteractions as training a single agent. We also show that ABPS-PBT furtherimproves the convergence speed and reduces the variance.

 

Quick Read (beta)

loading the full paper ...