Reinforcement Learning (RL) can be used to fit a mapping from patient stateto a medication regimen. Prior studies have used deterministic and value-basedtabular learning to learn a propofol dose from an observed anesthetic state.Deep RL replaces the table with a deep neural network and has been used tolearn medication regimens from registry databases. Here we perform the firstapplication of deep RL to closed-loop control of anesthetic dosing in asimulated environment. We use the cross-entropy method to train a deep neuralnetwork to map an observed anesthetic state to a probability of infusing afixed propofol dosage. During testing, we implement a deterministic policy thattransforms the probability of infusion to a continuous infusion rate. The modelis trained and tested on simulated pharmacokinetic/pharmacodynamic models withrandomized parameters to ensure robustness to patient variability. The deep RLagent significantly outperformed a proportional-integral-derivative controller(median absolute performance error 1.7% +/- 0.6 and 3.4% +/- 1.2). Modelingcontinuous input variables instead of a table affords more robust patternrecognition and utilizes our prior domain knowledge. Deep RL learned a smoothpolicy with a natural interpretation to data scientists and anesthesia careproviders alike.