Theoretical Barriers in Bellman-Based Reinforcement Learning

Abstract

Reinforcement Learning algorithms designed for high-dimensional spaces oftenenforce the Bellman equation on a sampled subset of states, relying ongeneralization to propagate knowledge across the state space. In this paper, weidentify and formalize a fundamental limitation of this common approach.Specifically, we construct counterexample problems with a simple structure thatthis approach fails to exploit. Our findings reveal that such algorithms canneglect critical information about the problems, leading to inefficiencies.Furthermore, we extend this negative result to another approach from theliterature: Hindsight Experience Replay learning state-to-state reachability.

Quick Read (beta)

loading the full paper ...