Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning

Abstract

We study two time-scale linear stochastic approximation algorithms, which canbe used to model well-known reinforcement learning algorithms such as GTD,GTD2, and TDC. We present finite-time performance bounds for the case where thelearning rate is fixed. The key idea in obtaining these bounds is to use aLyapunov function motivated by singular perturbation theory for lineardifferential equations. We use the bound to design an adaptive learning ratescheme which significantly improves the convergence rate over the known optimalpolynomial decay rule in our experiments, and can be used to potentiallyimprove the performance of any other schedule where the learning rate ischanged at pre-determined time instants.

Quick Read (beta)

loading the full paper ...