An Online Prediction Algorithm for Reinforcement Learning with Linear Function Approximation using Cross Entropy Method

Abstract

In this paper, we provide two new stable online algorithms for the problem ofprediction in reinforcement learning, \emph{i.e.}, estimating the valuefunction of a model-free Markov reward process using the linear functionapproximation architecture and with memory and computation costs scalingquadratically in the size of the feature set. The algorithms employ themulti-timescale stochastic approximation variant of the very popular crossentropy (CE) optimization method which is a model based search method to findthe global optimum of a real-valued function. A proof of convergence of thealgorithms using the ODE method is provided. We supplement our theoreticalresults with experimental comparisons. The algorithms achieve good performancefairly consistently on many RL benchmark problems with regards to computationalefficiency, accuracy and stability.

Quick Read (beta)

loading the full paper ...