On the Role of Discount Factor in Offline Reinforcement Learning

Abstract

Offline reinforcement learning (RL) enables effective learning frompreviously collected data without exploration, which shows great promise inreal-world applications when exploration is expensive or even infeasible. Thediscount factor, $\gamma$, plays a vital role in improving online RL sampleefficiency and estimation accuracy, but the role of the discount factor inoffline RL is not well explored. This paper examines two distinct effects of$\gamma$ in offline RL with theoretical analysis, namely the regularizationeffect and the pessimism effect. On the one hand, $\gamma$ is a regulator totrade-off optimality with sample efficiency upon existing offline techniques.On the other hand, lower guidance $\gamma$ can also be seen as a way ofpessimism where we optimize the policy's performance in the worst possiblemodels. We empirically verify the above theoretical observation with tabularMDPs and standard D4RL tasks. The results show that the discount factor playsan essential role in the performance of offline RL algorithms, both under smalldata regimes upon existing offline methods and in large data regimes withoutother conservative methods.

Quick Read (beta)

loading the full paper ...