Bayesian Reinforcement Learning in Factored POMDPs

Abstract

Bayesian approaches provide a principled solution to theexploration-exploitation trade-off in Reinforcement Learning. Typicalapproaches, however, either assume a fully observable environment or scalepoorly. This work introduces the Factored Bayes-Adaptive POMDP model, aframework that is able to exploit the underlying structure while learning thedynamics in partially observable systems. We also present a belief trackingmethod to approximate the joint posterior over state and model variables, andan adaptation of the Monte-Carlo Tree Search solution method, which togetherare capable of solving the underlying problem near-optimally. Our method isable to learn efficiently given a known factorization or also learn thefactorization and the model parameters at the same time. We demonstrate thatthis approach is able to outperform current methods and tackle problems thatwere previously infeasible.

Quick Read (beta)

loading the full paper ...