Smart Exploration in Reinforcement Learning using Bounded Uncertainty Models

Abstract

Reinforcement learning (RL) is a powerful tool for decision-making inuncertain environments, but it often requires large amounts of data to learn anoptimal policy. We propose using prior model knowledge to guide the explorationprocess to speed up this learning process. This model knowledge comes in theform of a model set to which the true transition kernel and reward functionbelong. We optimize over this model set to obtain upper and lower bounds on theQ-function, which are then used to guide the exploration of the agent. Weprovide theoretical guarantees on the convergence of the Q-function to theoptimal Q-function under the proposed class of exploring policies. Furthermore,we also introduce a data-driven regularized version of the model setoptimization problem that ensures the convergence of the class of exploringpolicies to the optimal policy. Lastly, we show that when the model set has aspecific structure, namely the bounded-parameter MDP (BMDP) framework, theregularized model set optimization problem becomes convex and simple toimplement. In this setting, we also show that we obtain finite-time convergenceto the optimal policy under additional assumptions. We demonstrate theeffectiveness of the proposed exploration strategy in a simulation study. Theresults indicate that the proposed method can significantly speed up thelearning process in reinforcement learning.

Quick Read (beta)

loading the full paper ...