Finite Sample Analyses for TD(0) with Function Approximation

  • 2017-12-11 08:21:21
  • Gal Dalal, Balázs Szörényi, Gugan Thoppe, Shie Mannor
  • 0

Abstract

TD(0) is one of the most commonly used algorithms in reinforcement learning.Despite this, there is no existing finite sample analysis for TD(0) withfunction approximation, even for the linear case. Our work is the first toprovide such results. Existing convergence rates for Temporal Difference (TD)methods apply only to somewhat modified versions, e.g., projected variants orones where stepsizes depend on unknown problem parameters. Our analyses obviatethese artificial alterations by exploiting strong properties of TD(0). Weprovide convergence rates both in expectation and with high-probability. Thetwo are obtained via different approaches that use relatively unknown, recentlydeveloped stochastic approximation techniques.

 

Quick Read (beta)

loading the full paper ...