Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens

Abstract

Offline reinforcement learning (RL) is crucial when online exploration iscostly or unsafe but often struggles with high epistemic uncertainty due tolimited data. Existing methods rely on fixed conservative policies, restrictingadaptivity and generalization. To address this, we propose Reflect-then-Plan(RefPlan), a novel doubly Bayesian offline model-based (MB) planning approach.RefPlan unifies uncertainty modeling and MB planning by recasting planning asBayesian posterior estimation. At deployment, it updates a belief overenvironment dynamics using real-time observations, incorporating uncertaintyinto MB planning via marginalization. Empirical results on standard benchmarksshow that RefPlan significantly improves the performance of conservativeoffline RL policies. In particular, RefPlan maintains robust performance underhigh epistemic uncertainty and limited data, while demonstrating resilience tochanging environment dynamics, improving the flexibility, generalizability, androbustness of offline-learned policies.

Quick Read (beta)

loading the full paper ...