Abstract
As surgical robots become more common, automating away some of the burden ofcomplex direct human operation becomes ever more feasible. Model-freereinforcement learning (RL) is a promising direction toward generalizableautomated surgical performance, but progress has been slowed by the lack ofefficient and realistic learning environments. In this paper, we describeadding reinforcement learning support to the da Vinci Skill Simulator, atraining simulation used around the world to allow surgeons to learn andrehearse technical skills. We successfully teach an RL-based agent to performsub-tasks in the simulator environment, using either image or state data. Asfar as we know, this is the first time an RL-based agent is taught from visualdata in a surgical robotics environment. Additionally, we tackle the sampleinefficiency of RL using a simple-to-implement system which we termhybrid-batch learning (HBL), effectively adding a second, long-term replaybuffer to the Q-learning process. Additionally, this allows us to bootstraplearning from images from the data collected using the easier task of learningfrom state. We show that HBL decreases our learning times significantly.