When Is Generalizable Reinforcement Learning Tractable?

Abstract

Agents trained by reinforcement learning (RL) often fail to generalize beyondthe environment they were trained in, even when presented with new scenariosthat seem very similar to the training environment. We study the querycomplexity required to train RL agents that can generalize to multipleenvironments. Intuitively, tractable generalization is only possible when theenvironments are similar or close in some sense. To capture this, we introduceStrong Proximity, a structural condition which precisely characterizes therelative closeness of different environments. We provide an algorithm whichexploits Strong Proximity to provably and efficiently generalize. We also showthat under a natural weakening of this condition, which we call Weak Proximity,RL can require query complexity that is exponential in the horizon togeneralize. A key consequence of our theory is that even when the environmentsshare optimal trajectories, and have highly similar reward and transitionfunctions (as measured by classical metrics), tractable generalization isimpossible.

Quick Read (beta)

loading the full paper ...