Marketing Budget Allocation with Offline Constrained Deep Reinforcement Learning

Abstract

We study the budget allocation problem in online marketing campaigns thatutilize previously collected offline data. We first discuss the long-termeffect of optimizing marketing budget allocation decisions in the offlinesetting. To overcome the challenge, we propose a novel game-theoretic offlinevalue-based reinforcement learning method using mixed policies. The proposedmethod reduces the need to store infinitely many policies in previous methodsto only constantly many policies, which achieves nearly optimal policyefficiency, making it practical and favorable for industrial usage. We furthershow that this method is guaranteed to converge to the optimal policy, whichcannot be achieved by previous value-based reinforcement learning methods formarketing budget allocation. Our experiments on a large-scale marketingcampaign with tens-of-millions users and more than one billion budget verifythe theoretical results and show that the proposed method outperforms variousbaseline methods. The proposed method has been successfully deployed to serveall the traffic of this marketing campaign.

Quick Read (beta)

loading the full paper ...