Reinforcement Learning with Policy Mixture Model for Temporal Point Processes Clustering

Abstract

Temporal point process is an expressive tool for modeling event sequencesover time. In this paper, we take a reinforcement learning view whereby theobserved sequences are assumed to be generated from a mixture of latentpolicies. The purpose is to cluster the sequences with different temporalpatterns into the underlying policies while learning each of the policy model.The flexibility of our model lies in: i) all the components are networksincluding the policy network for modeling the intensity function of temporalpoint process; ii) to handle varying-length event sequences, we resort toinverse reinforcement learning by decomposing the observed sequence into states(RNN hidden embedding of history) and actions (time interval to next event) inorder to learn the reward function, thus achieving better performance orincreasing efficiency compared to existing methods using rewards over theentire sequence such as log-likelihood or Wasserstein distance. We adopt anexpectation-maximization framework with the E-step estimating the clusterlabels for each sequence, and the M-step aiming to learn the respective policy.Extensive experiments show the efficacy of our method againststate-of-the-arts.

Quick Read (beta)

loading the full paper ...