Abstract
Visual reinforcement learning (RL) has made significant progress in recentyears, but the choice of visual feature extractor remains a crucial designdecision. This paper compares the performance of RL algorithms that train aconvolutional neural network (CNN) from scratch with those that utilizepre-trained visual representations (PVRs). We evaluate the Dormant RatioMinimization (DRM) algorithm, a state-of-the-art visual RL method, againstthree PVRs: ResNet18, DINOv2, and Visual Cortex (VC). We use the MetaworldPush-v2 and Drawer-Open-v2 tasks for our comparison. Our results show that thechoice of training from scratch compared to using PVRs for maximisingperformance is task-dependent, but PVRs offer advantages in terms of reducedreplay buffer size and faster training times. We also identify a strongcorrelation between the dormant ratio and model performance, highlighting theimportance of exploration in visual RL. Our study provides insights into thetrade-offs between training from scratch and using PVRs, informing the designof future visual RL algorithms.