Non-Markovian Reinforcement Learning using Fractional Dynamics

Abstract

Reinforcement learning (RL) is a technique to learn the control policy for anagent that interacts with a stochastic environment. In any given state, theagent takes some action, and the environment determines the probabilitydistribution over the next state as well as gives the agent some reward. MostRL algorithms typically assume that the environment satisfies Markovassumptions (i.e. the probability distribution over the next state depends onlyon the current state). In this paper, we propose a model-based RL technique fora system that has non-Markovian dynamics. Such environments are common in manyreal-world applications such as in human physiology, biological systems,material science, and population dynamics. Model-based RL (MBRL) techniquestypically try to simultaneously learn a model of the environment from the data,as well as try to identify an optimal policy for the learned model. We proposea technique where the non-Markovianity of the system is modeled through afractional dynamical system. We show that we can quantify the difference in theperformance of an MBRL algorithm that uses bounded horizon model predictivecontrol from the optimal policy. Finally, we demonstrate our proposed frameworkon a pharmacokinetic model of human blood glucose dynamics and show that ourfractional models can capture distant correlations on real-world datasets.

Quick Read (beta)

loading the full paper ...