A Model-Based Reinforcement Learning with Adversarial Training for Online Recommendation

Abstract

Reinforcement learning is effective in optimizing policies for recommendersystems. Current solutions mostly focus on model-free approaches, which requirefrequent interactions with a real environment, and thus are expensive in modellearning. Offline evaluation methods, such as importance sampling, canalleviate such limitations, but usually request a large amount of logged dataand do not work well when the action space is large. In this work, we propose amodel-based reinforcement learning solution which models the user-agentinteraction for offline policy learning via a generative adversarial network.To reduce bias in the learnt policy, we use the discriminator to evaluate thequality of generated sequences and rescale the generated rewards. Ourtheoretical analysis and empirical evaluations demonstrate the effectiveness ofour solution in identifying patterns from given offline data and learningpolicies based on the offline and generated data.

Quick Read (beta)

loading the full paper ...