Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time Reinforcement Learning

  • 2022-06-17 18:28:40
  • Harley Wiltzer, David Meger, Marc G. Bellemare
  • 0

Abstract

Continuous-time reinforcement learning offers an appealing formalism fordescribing control problems in which the passage of time is not naturallydivided into discrete increments. Here we consider the problem of predictingthe distribution of returns obtained by an agent interacting in acontinuous-time, stochastic environment. Accurate return predictions haveproven useful for determining optimal policies for risk-sensitive control,learning state representations, multiagent coordination, and more. We begin byestablishing the distributional analogue of the Hamilton-Jacobi-Bellman (HJB)equation for It\^o diffusions and the broader class of Feller-Dynkin processes.We then specialize this equation to the setting in which the returndistribution is approximated by $N$ uniformly-weighted particles, a commondesign choice in distributional algorithms. Our derivation highlightsadditional terms due to statistical diffusivity which arise from the properhandling of distributions in the continuous-time setting. Based on this, wepropose a tractable algorithm for approximately solving the distributional HJBbased on a JKO scheme, which can be implemented in an online control algorithm.We demonstrate the effectiveness of such an algorithm in a synthetic controlproblem.

 

Quick Read (beta)

loading the full paper ...