Abstract
One key challenge in reinforcement learning is the ability to generalizeknowledge in control problems. While deep learning methods have beensuccessfully combined with model-free reinforcement-learning algorithms, how toperform model-based reinforcement learning in the presence of approximationerrors still remains an open problem. Using successor features, a featurerepresentation that predicts a temporal constraint, this paper presents threecontributions: First, it shows how learning successor features is equivalent tomodel-free learning. Then, it shows how successor features encode modelreductions that compress the state space by creating state partitions ofbisimilar states. Using this representation, an intelligent agent is guaranteedto accurately predict future reward outcomes, a key property of model-basedreinforcement-learning algorithms. Lastly, it presents a loss objective andprediction error bounds showing that accurately predicting value functions andreward sequences is possible with an approximation of successor features. Onfinite control problems, we illustrate how minimizing this loss objectiveresults in approximate bisimulations. The results presented in this paperprovide a novel understanding of representations that can support model-freeand model-based reinforcement learning.