MODRL-TA:A Multi-Objective Deep Reinforcement Learning Framework for Traffic Allocation in E-Commerce Search

Abstract

Traffic allocation is a process of redistributing natural traffic to productsby adjusting their positions in the post-search phase, aimed at effectivelyfostering merchant growth, precisely meeting customer demands, and ensuring themaximization of interests across various parties within e-commerce platforms.Existing methods based on learning to rank neglect the long-term value oftraffic allocation, whereas approaches of reinforcement learning suffer frombalancing multiple objectives and the difficulties of cold starts withinrealworld data environments. To address the aforementioned issues, this paperpropose a multi-objective deep reinforcement learning framework consisting ofmulti-objective Q-learning (MOQ), a decision fusion algorithm (DFM) based onthe cross-entropy method(CEM), and a progressive data augmentation system(PDA).Specifically. MOQ constructs ensemble RL models, each dedicated to anobjective, such as click-through rate, conversion rate, etc. These modelsindividually determine the position of items as actions, aiming to estimate thelong-term value of multiple objectives from an individual perspective. Then weemploy DFM to dynamically adjust weights among objectives to maximize long-termvalue, addressing temporal dynamics in objective preferences in e-commercescenarios. Initially, PDA trained MOQ with simulated data from offline logs. Asexperiments progressed, it strategically integrated real user interaction data,ultimately replacing the simulated dataset to alleviate distributional shiftsand the cold start problem. Experimental results on real-world onlinee-commerce systems demonstrate the significant improvements of MODRL-TA, and wehave successfully deployed MODRL-TA on an e-commerce search platform.

Quick Read (beta)

loading the full paper ...