Beware of Instantaneous Dependence in Reinforcement Learning

Abstract

Playing an important role in Model-Based Reinforcement Learning (MBRL),environment models aim to predict future states based on the past. Existingworks usually ignore instantaneous dependence in the state, that is, assumingthat the future state variables are conditionally independent given the paststates. However, instantaneous dependence is prevalent in many RL environments.For instance, in the stock market, instantaneous dependence can exist betweentwo stocks because the fluctuation of one stock can quickly affect the otherand the resolution of price change is lower than that of the effect. In thispaper, we prove that with few exceptions, ignoring instantaneous dependence canresult in suboptimal policy learning in MBRL. To address the suboptimalityproblem, we propose a simple plug-and-play method to enable existing MBRLalgorithms to take instantaneous dependence into account. Through experimentson two benchmarks, we (1) confirm the existence of instantaneous dependencewith visualization; (2) validate our theoretical findings that ignoringinstantaneous dependence leads to suboptimal policy; (3) verify that our methodeffectively enables reinforcement learning with instantaneous dependence andimproves policy performance.

Quick Read (beta)

loading the full paper ...