Distributional reinforcement learning with linear function approximation

  • 2019-02-08 15:31:42
  • Marc G. Bellemare, Nicolas Le Roux, Pablo Samuel Castro, Subhodeep Moitra
  • 11

Abstract

Despite many algorithmic advances, our theoretical understanding of practicaldistributional reinforcement learning methods remains limited. One exception isRowland et al. (2018)'s analysis of the C51 algorithm in terms of the Cram\'erdistance, but their results only apply to the tabular setting and ignore C51'suse of a softmax to produce normalized distributions. In this paper we adaptthe Cram\'er distance to deal with arbitrary vectors. From it we derive a newdistributional algorithm which is fully Cram\'er-based and can be combined tolinear function approximation, with formal guarantees in the context of policyevaluation. In allowing the model's prediction to be any real vector, we losethe probabilistic interpretation behind the method, but otherwise maintain theappealing properties of distributional approaches. To the best of ourknowledge, ours is the first proof of convergence of a distributional algorithmcombined with function approximation. Perhaps surprisingly, our results provideevidence that Cram\'er-based distributional methods may perform worse thandirectly approximating the value function.

 

Introduction (beta)

None

 

Conclusion (beta)

None