Toward Simulating Environments in Reinforcement Learning Based Recommendations

Abstract

With the recent advances in Reinforcement Learning (RL), there have beentremendous interests in employing RL for recommender systems. However, directlytraining and evaluating a new RL-based recommendation algorithm needs tocollect users' real-time feedback in the real system, which is time and effortsconsuming and could negatively impact on users' experiences. Thus, it calls fora user simulator that can mimic real users' behaviors where we can pre-trainand evaluate new recommendation algorithms. Simulating users' behaviors in adynamic system faces immense challenges -- (i) the underlining itemdistribution is complex, and (ii) historical logs for each user are limited. Inthis paper, we develop a user simulator base on Generative Adversarial Network(GAN). To be specific, the generator captures the underlining distribution ofusers' historical logs and generates realistic logs that can be considered asaugmentations of real logs; while the discriminator not only distinguishes realand fake logs but also predicts users' behaviors. The experimental resultsbased on real-world e-commerce data demonstrate the effectiveness of theproposed simulator.

Quick Read (beta)

loading the full paper ...