Abstract
In reinforcement learning, we can learn a model of future observations andrewards, and use it to plan the agent's next actions. However, jointly modelingfuture observations can be computationally expensive or even intractable if theobservations are high-dimensional (e.g. images). For this reason, previousworks have considered partial models, which model only part of the observation.In this paper, we show that partial models can be causally incorrect: they areconfounded by the observations they don't model, and can therefore lead toincorrect planning. To address this, we introduce a general family of partialmodels that are provably causally correct, yet remain fast because they do notneed to fully model future observations.
Quick Read (beta)
loading the full paper ...