Counterfactual equivalence for POMDPs, and underlying deterministic environments

Abstract

Partially Observable Markov Decision Processes (POMDPs) are rich environmentsoften used in machine learning. But the issue of information and causalstructures in POMDPs has been relatively little studied. This paper presentsthe concepts of equivalent and counterfactually equivalent POMDPs, where agentscannot distinguish which environment they are in though any observations andactions. It shows that any POMDP is counterfactually equivalent, for any finitenumber of turns, to a deterministic POMDP with all uncertainty concentratedinto the initial state. This allows a better understanding of POMDPuncertainty, information, and learning.

Quick Read (beta)

loading the full paper ...