Abstract
Many methods for Model-based Reinforcement learning (MBRL) in Markov decisionprocesses (MDPs) provide guarantees for both the accuracy of the model they candeliver and the learning efficiency. At the same time, state abstractiontechniques allow for a reduction of the size of an MDP while maintaining abounded loss with respect to the original problem. Therefore, it may come as asurprise that no such guarantees are available when combining both techniques,i.e., where MBRL merely observes abstract states. Our theoretical analysisshows that abstraction can introduce a dependence between samples collectedonline (e.g., in the real world). That means that, without taking thisdependence into account, results for MBRL do not directly extend to thissetting. Our result shows that we can use concentration inequalities formartingales to overcome this problem. This result makes it possible to extendthe guarantees of existing MBRL algorithms to the setting with abstraction. Weillustrate this by combining R-MAX, a prototypical MBRL algorithm, withabstraction, thus producing the first performance guarantees for model-based'RL from Abstracted Observations': model-based reinforcement learning with anabstract model.