Abstract
Deep Reinforcement Learning (RL) experiments are commonly performed insimulated environment, due to the tremendous training sample demand from deepneural networks. However, model-based Deep Bayesian RL, such as Deep PILCO,allows a robot to learn good policies within few trials in the real world.Although Deep PILCO has been applied on many single-robot tasks, in here wepropose, for the first time, an application of Deep PILCO on a multi-robotconfrontation game, and compare the algorithm with a model-free Deep RLalgorithm, Deep Q-Learning. Our experiments show that Deep PILCO significantlyoutperforms Deep Q-Learning in learning efficiency and scalability. We concludethat sample-efficient Deep Bayesian learning algorithms have great prospects oncompetitive games where the agent aims to win the opponents in the real world,as opposed to simulated applications.