Abstract
Recent research has employed reinforcement learning (RL) algorithms tooptimize long-term user engagement in recommender systems, thereby avoidingcommon pitfalls such as user boredom and filter bubbles. They capture thesequential and interactive nature of recommendations, and thus offer aprincipled way to deal with long-term rewards and avoid myopic behaviors.However, RL approaches are intractable in the slate recommendation scenario -where a list of items is recommended at each interaction turn - due to thecombinatorial action space. In that setting, an action corresponds to a slatethat may contain any combination of items. While previous work has proposed well-chosen decompositions of actions so asto ensure tractability, these rely on restrictive and sometimes unrealisticassumptions. Instead, in this work we propose to encode slates in a continuous,low-dimensional latent space learned by a variational auto-encoder. Then, theRL agent selects continuous actions in this latent space, which are ultimatelydecoded into the corresponding slates. By doing so, we are able to (i) relaxassumptions required by previous work, and (ii) improve the quality of theaction selection by modeling full slates instead of independent items, inparticular by enabling diversity. Our experiments performed on a wide array ofsimulated environments confirm the effectiveness of our generative modeling ofslates over baselines in practical scenarios where the restrictive assumptionsunderlying the baselines are lifted. Our findings suggest that representationlearning using generative models is a promising direction towards generalizableRL-based slate recommendation.