Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation

Abstract

Modern listwise recommendation systems need to consider both long-term userperceptions and short-term interest shifts. Reinforcement learning can beapplied on recommendation to study such a problem but is also subject to largesearch space, sparse user feedback and long interactive latency. Motivated byrecent progress in hierarchical reinforcement learning, we propose a novelframework called mccHRL to provide different levels of temporal abstraction onlistwise recommendation. Within the hierarchical framework, the high-levelagent studies the evolution of user perception, while the low-level agentproduces the item selection policy by modeling the process as a sequentialdecision-making problem. We argue that such framework has a well-defineddecomposition of the outra-session context and the intra-session context, whichare encoded by the high-level and low-level agents, respectively. To verifythis argument, we implement both a simulator-based environment and anindustrial dataset-based experiment. Results observe significant performanceimprovement by our method, compared with several well-known baselines. Data andcodes have been made public.

Quick Read (beta)

loading the full paper ...