Learning to Advertise with Adaptive Exposure via Constrained Two-Level Reinforcement Learning

Abstract

For online advertising in e-commerce, the traditional problem is to assignthe right ad to the right user on fixed ad slots. In this paper, we investigatethe problem of advertising with adaptive exposure, in which the number of adslots and their locations can dynamically change over time based on theirrelative scores with recommendation products. In order to maintain userretention and long-term revenue, there are two types of constraints that needto be met in exposure: query-level and day-level constraints. We model thisproblem as constrained markov decision process with per-state constraint(psCMDP) and propose a constrained two-level reinforcement learning to decouplethe original advertising exposure optimization problem into two relativelyindependent sub-optimization problems. We also propose a constrained hindsightexperience replay mechanism to accelerate the policy training process.Experimental results show that our method can improve the advertising revenuewhile satisfying different levels of constraints under the real-world datasets.Besides, the proposal of constrained hindsight experience replay mechanism cansignificantly improve the training speed and the stability of policyperformance.

Quick Read (beta)

loading the full paper ...