Measuring and Characterizing Generalization in Deep Reinforcement Learning

Abstract

Deep reinforcement-learning methods have achieved remarkable performance onchallenging control tasks. Observations of the resulting behavior give theimpression that the agent has constructed a generalized representation thatsupports insightful action decisions. We re-examine what is meant bygeneralization in RL, and propose several definitions based on an agent'sperformance in on-policy, off-policy, and unreachable states. We propose a setof practical methods for evaluating agents with these definitions ofgeneralization. We demonstrate these techniques on a common benchmark task fordeep RL, and we show that the learned networks make poor decisions for statesthat differ only slightly from on-policy states, even though those states arenot selected adversarially. Taken together, these results call into questionthe extent to which deep Q-networks learn generalized representations, andsuggest that more experimentation and analysis is necessary before claims ofrepresentation learning can be supported.

Quick Read (beta)

loading the full paper ...