SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning

Abstract

This paper studies the transfer reinforcement learning (RL) problem wheremultiple RL problems have different reward functions but share the sameunderlying transition dynamics. In this setting, the Q-function of each RLproblem (task) can be decomposed into a successor feature (SF) and a rewardmapping: the former characterizes the transition dynamics, and the lattercharacterizes the task-specific reward function. This Q-function decomposition,coupled with a policy improvement operator known as generalized policyimprovement (GPI), reduces the sample complexity of finding the optimalQ-function, and thus the SF \& GPI framework exhibits promising empiricalperformance compared to traditional RL methods like Q-learning. However, itstheoretical foundations remain largely unestablished, especially when learningthe successor features using deep neural networks (SF-DQN). This paper studiesthe provable knowledge transfer using SFs-DQN in transfer RL problems. Weestablish the first convergence analysis with provable generalizationguarantees for SF-DQN with GPI. The theory reveals that SF-DQN with GPIoutperforms conventional RL approaches, such as deep Q-network, in terms ofboth faster convergence rate and better generalization. Numerical experimentson real and synthetic RL tasks support the superior performance of SF-DQN \&GPI, aligning with our theoretical findings.

Quick Read (beta)

loading the full paper ...