Hybrid quantum-classical algorithm for near-optimal planning in POMDPs

Abstract

Reinforcement learning (RL) provides a principled framework fordecision-making in partially observable environments, which can be modeled asMarkov decision processes and compactly represented through dynamic decisionBayesian networks. Recent advances demonstrate that inference on sparseBayesian networks can be accelerated using quantum rejection sampling combinedwith amplitude amplification, leading to a computational speedup in estimatingacceptance probabilities.\\ Building on this result, we introduce QuantumBayesian Reinforcement Learning (QBRL), a hybrid quantum-classical look-aheadalgorithm for model-based RL in partially observable environments. We present arigorous, oracle-free time complexity analysis under fault-tolerant assumptionsfor the quantum device. Unlike standard treatments that assume a black-boxoracle, we explicitly specify the inference process, allowing our bounds tomore accurately reflect the true computational cost. We show that, forenvironments whose dynamics form a sparse Bayesian network, horizon-basednear-optimal planning can be achieved sub-quadratically faster throughquantum-enhanced belief updates. Furthermore, we present numerical experiments benchmarking QBRL against itsclassical counterpart on simple yet illustrative decision-making tasks. Ourresults offer a detailed analysis of how the quantum computational advantagetranslates into decision-making performance, highlighting that the magnitude ofthe advantage can vary significantly across different deployment settings.

Quick Read (beta)

loading the full paper ...