Deep Reinforcement Learning with Weighted Q-Learning

Abstract

Reinforcement learning algorithms based on Q-learning are driving DeepReinforcement Learning (DRL) research towards solving complex problems andachieving super-human performance on many of them. Nevertheless, Q-Learning isknown to be positively biased since it learns by using the maximum over noisyestimates of expected values. Systematic overestimation of the action valuescoupled with the inherently high variance of DRL methods can lead toincrementally accumulate errors, causing learning algorithms to diverge.Ideally, we would like DRL agents to take into account their own uncertaintyabout the optimality of each action, and be able to exploit it to make moreinformed estimations of the expected return. In this regard, WeightedQ-Learning (WQL) effectively reduces bias and shows remarkable results instochastic environments. WQL uses a weighted sum of the estimated actionvalues, where the weights correspond to the probability of each action valuebeing the maximum; however, the computation of these probabilities is onlypractical in the tabular setting. In this work, we provide methodologicaladvances to benefit from the WQL properties in DRL, by using neural networkstrained with Dropout as an effective approximation of deep Gaussian processes.In particular, we adopt the Concrete Dropout variant to obtain calibratedestimates of epistemic uncertainty in DRL. The estimator, then, is obtained bytaking several stochastic forward passes through the action-value network andcomputing the weights in a Monte Carlo fashion. Such weights are Bayesianestimates of the probability of each action value corresponding to the maximumw.r.t. a posterior probability distribution estimated by Dropout. We show howour novel Deep Weighted Q-Learning algorithm reduces the bias w.r.t. relevantbaselines and provides empirical evidence of its advantages on representativebenchmarks.

Quick Read (beta)

loading the full paper ...