Reinforcement Learning in Non-Stationary Environments

Abstract

Reinforcement learning (RL) methods learn optimal decisions in the presenceof a stationary environment. However, the stationary assumption on theenvironment is very restrictive. In many real world problems like trafficsignal control, robotic applications, one often encounters situations withnon-stationary environments and in these scenarios, RL methods yieldsub-optimal decisions. In this paper, we thus consider the problem ofdeveloping RL methods that obtain optimal decisions in a non-stationaryenvironment. The goal of this problem is to maximize the long-term discountedreward achieved when the underlying model of the environment changes over time.To achieve this, we first adapt a change point algorithm to detect change inthe statistics of the environment and then develop an RL algorithm thatmaximizes the long-run reward accrued. We illustrate that our change pointmethod detects change in the model of the environment effectively and thusfacilitates the RL algorithm in maximizing the long-run reward. We furthervalidate the effectiveness of the proposed solution on non-stationary randomMarkov decision processes, a sensor energy management problem and a trafficsignal control problem.

Quick Read (beta)

loading the full paper ...