Abstract
Offline model-based reinforcement learning (MBRL) enhances data efficiency byutilizing pre-collected datasets to learn models and policies, especially inscenarios where exploration is costly or infeasible. Nevertheless, itsperformance often suffers from the objective mismatch between model and policylearning, resulting in inferior performance despite accurate model predictions.This paper first identifies the primary source of this mismatch comes from theunderlying confounders present in offline data for MBRL. Subsequently, weintroduce \textbf{B}ilin\textbf{E}ar \textbf{CAUS}alr\textbf{E}presentation~(BECAUSE), an algorithm to capture causalrepresentation for both states and actions to reduce the influence of thedistribution shift, thus mitigating the objective mismatch problem.Comprehensive evaluations on 18 tasks that vary in data quality and environmentcontext demonstrate the superior performance of BECAUSE over existing offlineRL algorithms. We show the generalizability and robustness of BECAUSE underfewer samples or larger numbers of confounders. Additionally, we offertheoretical analysis of BECAUSE to prove its error bound and sample efficiencywhen integrating causal representation into offline MBRL.