Model Selection for Offline Reinforcement Learning: Practical Considerations for Healthcare Settings

Abstract

Reinforcement learning (RL) can be used to learn treatment policies and aiddecision making in healthcare. However, given the need for generalization overcomplex state/action spaces, the incorporation of function approximators (e.g.,deep neural networks) requires model selection to reduce overfitting andimprove policy performance at deployment. Yet a standard validation pipelinefor model selection requires running a learned policy in the actualenvironment, which is often infeasible in a healthcare setting. In this work,we investigate a model selection pipeline for offline RL that relies onoff-policy evaluation (OPE) as a proxy for validation performance. We presentan in-depth analysis of popular OPE methods, highlighting the additionalhyperparameters and computational requirements (fitting/inference of auxiliarymodels) when used to rank a set of candidate policies. We compare the utilityof different OPE methods as part of the model selection pipeline in the contextof learning to treat patients with sepsis. Among all the OPE methods weconsidered, fitted Q evaluation (FQE) consistently leads to the best validationranking, but at a high computational cost. To balance this trade-off betweenaccuracy of ranking and computational efficiency, we propose a simple two-stageapproach to accelerate model selection by avoiding potentially unnecessarycomputation. Our work serves as a practical guide for offline RL modelselection and can help RL practitioners select policies using real-worlddatasets. To facilitate reproducibility and future extensions, the codeaccompanying this paper is available online athttps://github.com/MLD3/OfflineRL_ModelSelection.

Quick Read (beta)

loading the full paper ...