Abstract
In a wide variety of applications, humans interact with a complex environmentby means of asynchronous stochastic discrete events in continuous time. Can wedesign online interventions that will help humans achieve certain goals in suchasynchronous setting? In this paper, we address the above problem from theperspective of deep reinforcement learning of marked temporal point processes,where both the actions taken by an agent and the feedback it receives from theenvironment are asynchronous stochastic discrete events characterized usingmarked temporal point processes. In doing so, we define the agent's policyusing the intensity and mark distribution of the corresponding process and thenderive a flexible policy gradient method, which embeds the agent's actions andthe feedback it receives into real-valued vectors using deep recurrent neuralnetworks. Our method does not make any assumptions on the functional form ofthe intensity and mark distribution of the feedback and it allows forarbitrarily complex reward functions. We apply our methodology to two differentapplications in personalized teaching and viral marketing and, using datagathered from Duolingo and Twitter, we show that it may be able to findinterventions to help learners and marketers achieve their goals moreeffectively than alternatives.