Abstract
We consider the problem of control in an off-policy reinforcement learning(RL) context. We propose a policy gradient scheme that incorporates a smoothedfunctional-based gradient estimation scheme. We provide an asymptoticconvergence guarantee for the proposed algorithm using the ordinarydifferential equation (ODE) approach. Further, we derive a non-asymptotic boundthat quantifies the rate of convergence of the proposed algorithm.
Quick Read (beta)
loading the full paper ...