Abstract
Bayesian approaches provide a principled solution to theexploration-exploitation trade-off in Reinforcement Learning. Typicalapproaches, however, either assume a fully observable environment or scalepoorly. This work introduces the Factored Bayes-Adaptive POMDP model, aframework that is able to exploit the underlying structure while learning thedynamics in partially observable systems. We also present a belief trackingmethod to approximate the joint posterior over state and model variables, andan adaptation of the Monte-Carlo Tree Search solution method, which togetherare capable of solving the underlying problem near-optimally. Our method isable to learn efficiently given a known factorization or also learn thefactorization and the model parameters at the same time. We demonstrate thatthis approach is able to outperform current methods and tackle problems thatwere previously infeasible.