Environment Reconstruction with Hidden Confounders for Reinforcement Learning based Recommendation

Abstract

Reinforcement learning aims at searching the best policy model for decisionmaking, and has been shown powerful for sequential recommendations. Thetraining of the policy by reinforcement learning, however, is placed in anenvironment. In many real-world applications, however, the policy training inthe real environment can cause an unbearable cost, due to the exploration inthe environment. Environment reconstruction from the past data is thus anappealing way to release the power of reinforcement learning in theseapplications. The reconstruction of the environment is, basically, to extractthe casual effect model from the data. However, real-world applications areoften too complex to offer fully observable environment information. Therefore,quite possibly there are unobserved confounding variables lying behind thedata. The hidden confounder can obstruct an effective reconstruction of theenvironment. In this paper, by treating the hidden confounder as a hiddenpolicy, we propose a deconfounded multi-agent environment reconstruction(DEMER) approach in order to learn the environment together with the hiddenconfounder. DEMER adopts a multi-agent generative adversarial imitationlearning framework. It proposes to introduce the confounder embedded policy,and use the compatible discriminator for training the policies. We then applyDEMER in an application of driver program recommendation. We firstly use anartificial driver program recommendation environment, abstracted from the realapplication, to verify and analyze the effectiveness of DEMER. We then testDEMER in the real application of Didi Chuxing. Experiment results show thatDEMER can effectively reconstruct the hidden confounder, and thus can build theenvironment better. DEMER also derives a recommendation policy with asignificantly improved performance in the test phase of the real application.

Quick Read (beta)

loading the full paper ...