Smoothed functional-based gradient algorithms for off-policy reinforcement learning

  • 2021-01-06 17:06:42
  • Nithia Vijayan, Prashanth L. A
  • 2


We consider the problem of control in an off-policy reinforcement learning(RL) context. We propose a policy gradient scheme that incorporates a smoothedfunctional-based gradient estimation scheme. We provide an asymptoticconvergence guarantee for the proposed algorithm using the ordinarydifferential equation (ODE) approach. Further, we derive a non-asymptotic boundthat quantifies the rate of convergence of the proposed algorithm.


Quick Read (beta)

loading the full paper ...