Abstract
Model-based reinforcement learning refers to a set of approaches capable ofsample-efficient decision making, which create an explicit model of theenvironment. This model can subsequently be used for learning optimal policies.In this paper, we propose a temporal Gaussian Mixture Model composed of aperception model and a transition model. The perception model extracts discrete(latent) states from continuous observations using a variational Gaussianmixture likelihood. Importantly, our model constantly monitors the collecteddata searching for new Gaussian components, i.e., the perception model performsa form of structure learning (Smith et al., 2020; Friston et al., 2018; Neacsuet al., 2022) as it learns the number of Gaussian components in the mixture.Additionally, the transition model learns the temporal transition betweenconsecutive time steps by taking advantage of the Dirichlet-categoricalconjugacy. Both the perception and transition models are able to forget part ofthe data points, while integrating the information they provide within theprior, which ensure fast variational inference. Finally, decision making isperformed with a variant of Q-learning which is able to learn Q-values frombeliefs over states. Empirically, we have demonstrated the model's ability tolearn the structure of several mazes: the model discovered the number of statesand the transition probabilities between these states. Moreover, using itslearned Q-values, the agent was able to successfully navigate from the startingposition to the maze's exit.