Learning to Collaborate: Multi-Scenario Ranking via Multi-Agent Reinforcement Learning

Abstract

Ranking is a fundamental and widely studied problem in scenarios such assearch, advertising, and recommendation. However, joint optimization formulti-scenario ranking, which aims to improve the overall performance ofseveral ranking strategies in different scenarios, is rather untouched.Separately optimizing each individual strategy has two limitations. The firstone is lack of collaboration between scenarios meaning that each strategymaximizes its own objective but ignores the goals of other strategies, leadingto a sub-optimal overall performance. The second limitation is the inability ofmodeling the correlation between scenarios meaning that independentoptimization in one scenario only uses its own user data but ignores thecontext in other scenarios. In this paper, we formulate multi-scenario ranking as a fully cooperative,partially observable, multi-agent sequential decision problem. We propose anovel model named Multi-Agent Recurrent Deterministic Policy Gradient (MA-RDPG)which has a communication component for passing messages, several privateactors (agents) for making actions for ranking, and a centralized critic forevaluating the overall performance of the co-working actors. Each scenario istreated as an agent (actor). Agents collaborate with each other by sharing aglobal action-value function (the critic) and passing messages that encodeshistorical information across scenarios. The model is evaluated with onlinesettings on a large E-commerce platform. Results show that the proposed modelexhibits significant improvements against baselines in terms of the overallperformance.

Quick Read (beta)

loading the full paper ...