Abstract
The idea of decision-aware model learning, that models should be accuratewhere it matters for decision-making, has gained prominence in model-basedreinforcement learning. While promising theoretical results have beenestablished, the empirical performance of algorithms leveraging adecision-aware loss has been lacking, especially in continuous controlproblems. In this paper, we present a study on the necessary components fordecision-aware reinforcement learning models and we showcase design choicesthat enable well-performing algorithms. To this end, we provide a theoreticaland empirical investigation into prominent algorithmic ideas in the field. Wehighlight that empirical design decisions established in the MuZero line ofworks are vital to achieving good performance for related algorithms, and weshowcase differences in behavior between different instantiations ofvalue-aware algorithms in stochastic environments. Using these insights, wepropose the Latent Model-Based Decision-Aware Actor-Critic framework($\lambda$-AC) for decision-aware model-based reinforcement learning incontinuous state-spaces and highlight important design choices in differentenvironments.