Tackling Long-Horizon Tasks with Model-based Offline Reinforcement Learning

Abstract

Model-based offline reinforcement learning (RL) is a compelling approach thataddresses the challenge of learning from limited, static data by generatingimaginary trajectories using learned models. However, it falls short in solvinglong-horizon tasks due to high bias in value estimation from model rollouts. Inthis paper, we introduce a novel model-based offline RL method, Lower ExpectileQ-learning (LEQ), which enhances long-horizon task performance by mitigatingthe high bias in model-based value estimation via expectile regression of$\lambda$-returns. Our empirical results show that LEQ significantlyoutperforms previous model-based offline RL methods on long-horizon tasks, suchas the D4RL AntMaze tasks, matching or surpassing the performance of model-freeapproaches. Our experiments demonstrate that expectile regression,$\lambda$-returns, and critic training on offline data are all crucial foraddressing long-horizon tasks. Additionally, LEQ achieves performancecomparable to the state-of-the-art model-based and model-free offline RLmethods on the NeoRL benchmark and the D4RL MuJoCo Gym tasks.

Quick Read (beta)

loading the full paper ...