Distributional Reinforcement Learning with Maximum Mean Discrepancy

Abstract

Distributional reinforcement learning (RL) has achieved state-of-the-artperformance in Atari games by recasting the traditional RL into a distributionestimation problem, explicitly estimating the probability distribution insteadof the expectation of a total return. The bottleneck in distributional RL liesin the estimation of this distribution where one must resort to an approximaterepresentation of the return distributions which are infinite-dimensional. Mostexisting methods focus on learning a set of predefined statistic functionals ofthe return distributions requiring involved projections to maintain the orderstatistics. We take a different perspective using deterministic samplingwherein we approximate the return distributions with a set of deterministicparticles that are not attached to any predefined statistic functional,allowing us to freely approximate the return distributions. The learning isthen interpreted as evolution of these particles so that a distance between thereturn distribution and its target distribution is minimized. This learning aimis realized via maximum mean discrepancy (MMD) distance which in turn leads toa simpler loss amenable to backpropagation. Experiments on the suite of Atari2600 games show that our algorithm outperforms the standard distributional RLbaselines and sets a new record in the Atari games for non-distributed agents.

Quick Read (beta)

loading the full paper ...