Variance Reduced Advantage Estimation with $δ$ Hindsight Credit Assignment

Abstract

Hindsight Credit Assignment (HCA) refers to a recently proposed family ofmethods for producing more efficient credit assignment in reinforcementlearning. These methods work by explicitly estimating the probability thatcertain actions were taken in the past given present information. Prior workhas studied the properties of such methods and demonstrated their behaviourempirically. We extend this work by introducing a particular HCA algorithmwhich has provably lower variance than the conventional Monte-Carlo estimatorwhen the necessary functions can be estimated exactly. This result provides astrong theoretical basis for how HCA could be broadly useful.

Quick Read (beta)

loading the full paper ...