Abstract
Deep Reinforcement Learning (DRL) experiments are commonly performed insimulated environments due to the tremendous training sample demands from deepneural networks. In contrast, model-based Bayesian learning allows a robot tolearn good policies within a few trials in the real world. Although methodssuch as Deep PILCO have been applied on many single-robot tasks, here wepropose an application of Deep PILCO on finding optimal solutions to theproblem of winning a multi-robot combat game. We compare the deep Bayesianlearning algorithm with a model-free Deep RL algorithm, Deep Q-Learning, byanalyzing the results collected from simulations and real-world experiments. Inthis game, the RL algorithms' inputs are noisy and unstable due to the filteredLiDAR sensory signal. Surprisingly, our experiments show that thesample-efficient Deep Bayesian RL performance is better than DRL even whencomparing the results of a real-world Deep Bayesian RL to those of asimulation-based Deep Q-Learning. Our results point to the advantage ofbypassing the reality gap when learning in the real-world with faster learningrates than simulations.