An Investigation of Offline Reinforcement Learning in Factorisable Action Spaces

Abstract

Expanding reinforcement learning (RL) to offline domains generates promisingprospects, particularly in sectors where data collection poses substantialchallenges or risks. Pivotal to the success of transferring RL offline ismitigating overestimation bias in value estimates for state-action pairs absentfrom data. Whilst numerous approaches have been proposed in recent years, thesetend to focus primarily on continuous or small-scale discrete action spaces.Factorised discrete action spaces, on the other hand, have received relativelylittle attention, despite many real-world problems naturally havingfactorisable actions. In this work, we undertake a formative investigation intooffline reinforcement learning in factorisable action spaces. Usingvalue-decomposition as formulated in DecQN as a foundation, we present the casefor a factorised approach and conduct an extensive empirical evaluation ofseveral offline techniques adapted to the factorised setting. In the absence ofestablished benchmarks, we introduce a suite of our own comprising datasets ofvarying quality and task complexity. Advocating for reproducible research andinnovation, we make all datasets available for public use alongside our codebase.

Quick Read (beta)

loading the full paper ...