Sample-Efficient Reinforcement Learning in the Presence of Exogenous Information

Abstract

In real-world reinforcement learning applications the learner's observationspace is ubiquitously high-dimensional with both relevant and irrelevantinformation about the task at hand. Learning from high-dimensional observationshas been the subject of extensive investigation in supervised learning andstatistics (e.g., via sparsity), but analogous issues in reinforcement learningare not well understood, even in finite state/action (tabular) domains. Weintroduce a new problem setting for reinforcement learning, the ExogenousMarkov Decision Process (ExoMDP), in which the state space admits an (unknown)factorization into a small controllable (or, endogenous) component and a largeirrelevant (or, exogenous) component; the exogenous component is independent ofthe learner's actions, but evolves in an arbitrary, temporally correlatedfashion. We provide a new algorithm, ExoRL, which learns a near-optimal policywith sample complexity polynomial in the size of the endogenous component andnearly independent of the size of the exogenous component, thereby offering adoubly-exponential improvement over off-the-shelf algorithms. Our resultshighlight for the first time that sample-efficient reinforcement learning ispossible in the presence of exogenous information, and provide a simple,user-friendly benchmark for investigation going forward.

Quick Read (beta)

loading the full paper ...