Reactive Reinforcement Learning in Asynchronous Environments

Abstract

The relationship between a reinforcement learning (RL) agent and anasynchronous environment is often ignored. Frequently used models of theinteraction between an agent and its environment, such as Markov DecisionProcesses (MDP) or Semi-Markov Decision Processes (SMDP), do not capture thefact that, in an asynchronous environment, the state of the environment maychange during computation performed by the agent. In an asynchronousenvironment, minimizing reaction time---the time it takes for an agent to reactto an observation---also minimizes the time in which the state of theenvironment may change following observation. In many environments, thereaction time of an agent directly impacts task performance by permitting theenvironment to transition into either an undesirable terminal state or a statewhere performing the chosen action is inappropriate. We propose a class ofreactive reinforcement learning algorithms that address this problem ofasynchronous environments by immediately acting after observing new stateinformation. We compare a reactive SARSA learning algorithm with theconventional SARSA learning algorithm on two asynchronous robotic tasks(emergency stopping and impact prevention), and show that the reactive RLalgorithm reduces the reaction time of the agent by approximately the durationof the algorithm's learning update. This new class of reactive algorithms mayfacilitate safer control and faster decision making without any change tostandard learning guarantees.

Quick Read (beta)

loading the full paper ...