Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?

Abstract

Modern deep learning methods provide an effective means to learn goodrepresentations. However, is a good representation itself sufficient forefficient reinforcement learning? This question is largely unexplored, and theextant body of literature mainly focuses on conditions which permit efficientreinforcement learning with little understanding of what are necessaryconditions for efficient reinforcement learning. This work provides strongnegative results for reinforcement learning methods with function approximationfor which a good representation (feature extractor) is known to the agent,focusing on natural representational conditions relevant to value-basedlearning and policy-based learning. For value-based learning, we show that evenif the agent has a highly accurate linear representation, the agent still needsto sample exponentially many trajectories in order to find a near-optimalpolicy. For policy-based learning, we show even if the agent's linearrepresentation is capable of perfectly representing the optimal policy, theagent still needs to sample exponentially many trajectories in order to find anear-optimal policy. These lower bounds highlight the fact that having a good (value-based orpolicy-based) representation in and of itself is insufficient for efficientreinforcement learning. In particular, these results provide new insights intowhy the existing provably efficient reinforcement learning methods rely onfurther assumptions, which are often model-based in nature. Additionally, ourlower bounds imply exponential separations in the sample complexity between 1)value-based learning with perfect representation and value-based learning witha good-but-not-perfect representation, 2) value-based learning and policy-basedlearning, 3) policy-based learning and supervised learning and 4) reinforcementlearning and imitation learning.

Quick Read (beta)

loading the full paper ...