Smoothed functional-based gradient algorithms for off-policy reinforcement learning

Abstract

We consider the problem of control in an off-policy reinforcement learning(RL) context. We propose a policy gradient scheme that incorporates a smoothedfunctional-based gradient estimation scheme. We provide an asymptoticconvergence guarantee for the proposed algorithm using the ordinarydifferential equation (ODE) approach. Further, we derive a non-asymptotic boundthat quantifies the rate of convergence of the proposed algorithm.

Quick Read (beta)

loading the full paper ...