Abstract
Offline reinforcement learning (RL) is an effective tool for real-worldrecommender systems with its capacity to model the dynamic interest of usersand its interactive nature. Most existing offline RL recommender systems focuson model-based RL through learning a world model from offline data and buildingthe recommendation policy by interacting with this model. Although thesemethods have made progress in the recommendation performance, the effectivenessof model-based offline RL methods is often constrained by the accuracy of theestimation of the reward model and the model uncertainties, primarily due tothe extreme discrepancy between offline logged data and real-world data in userinteractions with online platforms. To fill this gap, a more accurate rewardmodel and uncertainty estimation are needed for the model-based RL methods. Inthis paper, a novel model-based Reward Shaping in Offline ReinforcementLearning for Recommender Systems, ROLeR, is proposed for reward and uncertaintyestimation in recommendation systems. Specifically, a non-parametric rewardshaping method is designed to refine the reward model. In addition, a flexibleand more representative uncertainty penalty is designed to fit the needs ofrecommendation systems. Extensive experiments conducted on four benchmarkdatasets showcase that ROLeR achieves state-of-the-art performance comparedwith existing baselines. The source code can be downloaded athttps://github.com/ArronDZhang/ROLeR.