Computably Continuous Reinforcement-Learning Objectives are PAC-learnable

Abstract

In reinforcement learning, the classic objectives of maximizing discountedand finite-horizon cumulative rewards are PAC-learnable: There are algorithmsthat learn a near-optimal policy with high probability using a finite amount ofsamples and computation. In recent years, researchers have introducedobjectives and corresponding reinforcement-learning algorithms beyond theclassic cumulative rewards, such as objectives specified as linear temporallogic formulas. However, questions about the PAC-learnability of these newobjectives have remained open. This work demonstrates the PAC-learnability of general reinforcement-learningobjectives through sufficient conditions for PAC-learnability in two analysissettings. In particular, for the analysis that considers only samplecomplexity, we prove that if an objective given as an oracle is uniformlycontinuous, then it is PAC-learnable. Further, for the analysis that considerscomputational complexity, we prove that if an objective is computable, then itis PAC-learnable. In other words, if a procedure computes successiveapproximations of the objective's value, then the objective is PAC-learnable. We give three applications of our condition on objectives from the literaturewith previously unknown PAC-learnability and prove that these objectives arePAC-learnable. Overall, our result helps verify existing objectives'PAC-learnability. Also, as some studied objectives that are not uniformlycontinuous have been shown to be not PAC-learnable, our results could guide thedesign of new PAC-learnable objectives.

Quick Read (beta)

loading the full paper ...