Abstract
In this paper, we study a multi-step interactive recommendation problem,where the item recommended at current step may affect the quality of futurerecommendations. To address the problem, we develop a novel and effectiveapproach, named CFRL, which seamlessly integrates the ideas of bothcollaborative filtering (CF) and reinforcement learning (RL). Morespecifically, we first model the recommender-user interactive recommendationproblem as an agent-environment RL task, which is mathematically described by aMarkov decision process (MDP). Further, to achieve collaborativerecommendations for the entire user community, we propose a novel CF-based MDPby encoding the states of all users into a shared latent vector space. Finally,we propose an effective Q-network learning method to learn the agent's optimalpolicy based on the CF-based MDP. The capability of CFRL is demonstrated bycomparing its performance against a variety of existing methods on real-worlddatasets.