Deep vs. Deep Bayesian: Reinforcement Learning on a Multi-Robot Competitive Experiment

Abstract

Deep Reinforcement Learning (RL) experiments are commonly performed insimulated environment, due to the tremendous training sample demand from deepneural networks. However, model-based Deep Bayesian RL, such as Deep PILCO,allows a robot to learn good policies within few trials in the real world.Although Deep PILCO has been applied on many single-robot tasks, in here wepropose, for the first time, an application of Deep PILCO on a multi-robotconfrontation game, and compare the algorithm with a model-free Deep RLalgorithm, Deep Q-Learning. Our experiments show that Deep PILCO significantlyoutperforms Deep Q-Learning in learning efficiency and scalability. We concludethat sample-efficient Deep Bayesian learning algorithms have great prospects oncompetitive games where the agent aims to win the opponents in the real world,as opposed to simulated applications.

Quick Read (beta)

loading the full paper ...