Abstract
Reinforcement Learning (RL) problems are being considered under increasinglymore complex structures. While tabular and linear models have been thoroughlyexplored, the analytical study of RL under nonlinear function approximation,especially kernel-based models, has recently gained traction for their strongrepresentational capacity and theoretical tractability. In this context, weexamine the question of statistical efficiency in kernel-based RL within thereward-free RL framework, specifically asking: how many samples are required todesign a near-optimal policy? Existing work addresses this question underrestrictive assumptions about the class of kernel functions. We first explorethis question by assuming a generative model, then relax this assumption at thecost of increasing the sample complexity by a factor of H, the length of theepisode. We tackle this fundamental problem using a broad class of kernels anda simpler algorithm compared to prior work. Our approach derives new confidenceintervals for kernel ridge regression, specific to our RL setting, which may beof broader applicability. We further validate our theoretical findings throughsimulations.