Deep Transfer $Q$-Learning for Offline Non-Stationary Reinforcement Learning

Abstract

In dynamic decision-making scenarios across business and healthcare,leveraging sample trajectories from diverse populations can significantlyenhance reinforcement learning (RL) performance for specific targetpopulations, especially when sample sizes are limited. While existing transferlearning methods primarily focus on linear regression settings, they lackdirect applicability to reinforcement learning algorithms. This paper pioneersthe study of transfer learning for dynamic decision scenarios modeled bynon-stationary finite-horizon Markov decision processes, utilizing neuralnetworks as powerful function approximators and backward inductive learning. Wedemonstrate that naive sample pooling strategies, effective in regressionsettings, fail in Markov decision processes.To address this challenge, weintroduce a novel ``re-weighted targeting procedure'' to construct``transferable RL samples'' and propose ``transfer deep $Q^*$-learning'',enabling neural network approximation with theoretical guarantees. We assumethat the reward functions are transferable and deal with both situations inwhich the transition densities are transferable or nontransferable. Ouranalytical techniques for transfer learning in neural network approximation andtransition density transfers have broader implications, extending to supervisedtransfer learning with neural networks and domain shift scenarios. Empiricalexperiments on both synthetic and real datasets corroborate the advantages ofour method, showcasing its potential for improving decision-making throughstrategically constructing transferable RL samples in non-stationaryreinforcement learning contexts.

Quick Read (beta)

loading the full paper ...