Successor Features Combine Elements of Model-Free and Model-based Reinforcement Learning

Abstract

A key question in reinforcement learning is how an intelligent agent cangeneralize knowledge across different inputs. By generalizing across differentinputs, information learned for one input can be immediately reused forimproving predictions for another input. Reusing information allows an agent tocompute an optimal decision-making strategy using less data. Staterepresentation is a key element of the generalization process, compressing ahigh-dimensional input space into a low-dimensional latent state space. Thisarticle analyzes properties of different latent state spaces, leading to newconnections between model-based and model-free reinforcement learning.Successor features, which predict frequencies of future observations, form alink between model-based and model-free learning: Learning to predict futureexpected reward outcomes, a key characteristic of model-based agents, isequivalent to learning successor features. Learning successor features is aform of temporal difference learning and is equivalent to learning to predict asingle policy's utility, which is a characteristic of model-free agents.Drawing on the connection between model-based reinforcement learning andsuccessor features, we demonstrate that representations that are predictive offuture reward outcomes generalize across variations in both transitions andrewards. This result extends previous work on successor features, which isconstrained to fixed transitions and assumes re-learning of the transferredstate representation.

Quick Read (beta)

loading the full paper ...