Abstract
Real-world applications of reinforcement learning often involve environmentswhere agents operate on complex, high-dimensional observations, but theunderlying (''latent'') dynamics are comparatively simple. However, outside ofrestrictive settings such as small latent spaces, the fundamental statisticalrequirements and algorithmic principles for reinforcement learning under latentdynamics are poorly understood. This paper addresses the question of reinforcement learning under$\textit{general}$ latent dynamics from a statistical and algorithmicperspective. On the statistical side, our main negative result shows that mostwell-studied settings for reinforcement learning with function approximationbecome intractable when composed with rich observations; we complement thiswith a positive result, identifying latent pushforward coverability as ageneral condition that enables statistical tractability. Algorithmically, wedevelop provably efficient observable-to-latent reductions -- that is,reductions that transform an arbitrary algorithm for the latent MDP into analgorithm that can operate on rich observations -- in two settings: one wherethe agent has access to hindsight observations of the latent dynamics [LADZ23],and one where the agent can estimate self-predictive latent models [SAGHCB20].Together, our results serve as a first step toward a unified statistical andalgorithmic theory for reinforcement learning under latent dynamics.