Retentive Decision Transformer with Adaptive Masking for Reinforcement Learning based Recommendation Systems

Abstract

Reinforcement Learning-based Recommender Systems (RLRS) have shown promiseacross a spectrum of applications, from e-commerce platforms to streamingservices. Yet, they grapple with challenges, notably in crafting rewardfunctions and harnessing large pre-existing datasets within the RL framework.Recent advancements in offline RLRS provide a solution for how to address thesetwo challenges. However, existing methods mainly rely on the transformerarchitecture, which, as sequence lengths increase, can introduce challengesassociated with computational resources and training costs. Additionally, theprevalent methods employ fixed-length input trajectories, restricting theircapacity to capture evolving user preferences. In this study, we introduce anew offline RLRS method to deal with the above problems. We reinterpret theRLRS challenge by modeling sequential decision-making as an inference task,leveraging adaptive masking configurations. This adaptive approach selectivelymasks input tokens, transforming the recommendation task into an inferencechallenge based on varying token subsets, thereby enhancing the agent's abilityto infer across diverse trajectory lengths. Furthermore, we incorporate amulti-scale segmented retention mechanism that facilitates efficient modelingof long sequences, significantly enhancing computational efficiency. Ourexperimental analysis, conducted on both online simulator and offline datasets,clearly demonstrates the advantages of our proposed method.

Quick Read (beta)

loading the full paper ...