Look Before Leap: Look-Ahead Planning with Uncertainty in Reinforcement Learning

Abstract

Model-based reinforcement learning (MBRL) has demonstrated superior sampleefficiency compared to model-free reinforcement learning (MFRL). However, thepresence of inaccurate models can introduce biases during policy learning,resulting in misleading trajectories. The challenge lies in obtaining accuratemodels due to limited diverse training data, particularly in regions withlimited visits (uncertain regions). Existing approaches passively quantifyuncertainty after sample generation, failing to actively collect uncertainsamples that could enhance state coverage and improve model accuracy. Moreover,MBRL often faces difficulties in making accurate multi-step predictions,thereby impacting overall performance. To address these limitations, we proposea novel framework for uncertainty-aware policy optimization with model-basedexploratory planning. In the model-based planning phase, we introduce anuncertainty-aware k-step lookahead planning approach to guide action selectionat each step. This process involves a trade-off analysis between modeluncertainty and value function approximation error, effectively enhancingpolicy performance. In the policy optimization phase, we leverage anuncertainty-driven exploratory policy to actively collect diverse trainingsamples, resulting in improved model accuracy and overall performance of the RLagent. Our approach offers flexibility and applicability to tasks with varyingstate/action spaces and reward structures. We validate its effectivenessthrough experiments on challenging robotic manipulation tasks and Atari games,surpassing state-of-the-art methods with fewer interactions, thereby leading tosignificant performance improvements.

Quick Read (beta)

loading the full paper ...