OCMDP: Observation-Constrained Markov Decision Process

Abstract

In many practical applications, decision-making processes must balance thecosts of acquiring information with the benefits it provides. Traditionalcontrol systems often assume full observability, an unrealistic assumption whenobservations are expensive. We tackle the challenge of simultaneously learningobservation and control strategies in such cost-sensitive environments byintroducing the Observation-Constrained Markov Decision Process (OCMDP), wherethe policy influences the observability of the true state. To manage thecomplexity arising from the combined observation and control actions, wedevelop an iterative, model-free deep reinforcement learning algorithm thatseparates the sensing and control components of the policy. This decompositionenables efficient learning in the expanded action space by focusing on when andwhat to observe, as well as determining optimal control actions, withoutrequiring knowledge of the environment's dynamics. We validate our approach ona simulated diagnostic task and a realistic healthcare environment usingHeartPole. Given both scenarios, the experimental results demonstrate that ourmodel achieves a substantial reduction in observation costs on average,significantly outperforming baseline methods by a notable margin in efficiency.

Quick Read (beta)

loading the full paper ...