Real-Time Reinforcement Learning

Abstract

Markov Decision Processes (MDPs), the mathematical framework underlying mostalgorithms in Reinforcement Learning (RL), are often used in a way thatwrongfully assumes that the state of an agent's environment does not changeduring action selection. As RL systems based on MDPs begin to find applicationin real-world safety critical situations, this mismatch between the assumptionsunderlying classical MDPs and the reality of real-time computation may lead toundesirable outcomes. In this paper, we introduce a new framework, in whichstates and actions evolve simultaneously and show how it is related to theclassical MDP formulation. We analyze existing algorithms under the newreal-time formulation and show why they are suboptimal when used in real-time.We then use those insights to create a new algorithm Real-Time Actor-Critic(RTAC) that outperforms the existing state-of-the-art continuous controlalgorithm Soft Actor-Critic both in real-time and non-real-time settings. Codeand videos can be found at https://github.com/rmst/rtrl.

Quick Read (beta)

loading the full paper ...