Fully Parameterized Quantile Function for Distributional Reinforcement Learning

Abstract

Distributional Reinforcement Learning (RL) differs from traditional RL inthat, rather than the expectation of total returns, it estimates distributionsand has achieved state-of-the-art performance on Atari Games. The key challengein practical distributional RL algorithms lies in how to parameterize estimateddistributions so as to better approximate the true continuous distribution.Existing distributional RL algorithms parameterize either the probability sideor the return value side of the distribution function, leaving the other sideuniformly fixed as in C51, QR-DQN or randomly sampled as in IQN. In this paper,we propose fully parameterized quantile function that parameterizes both thequantile fraction axis (i.e., the x-axis) and the value axis (i.e., y-axis) fordistributional RL. Our algorithm contains a fraction proposal network thatgenerates a discrete set of quantile fractions and a quantile value networkthat gives corresponding quantile values. The two networks are jointly trainedto find the best approximation of the true distribution. Experiments on 55Atari Games show that our algorithm significantly outperforms existingdistributional RL algorithms and creates a new record for the Atari LearningEnvironment for non-distributed agents.

Quick Read (beta)

loading the full paper ...