Treatment policies learned via reinforcement learning (RL) from observationalhealth data are sensitive to subtle choices in study design. We highlight asimple approach, trajectory inspection, to bring clinicians into an iterativedesign process for model-based RL studies. We inspect trajectories where themodel recommends unexpectedly aggressive treatments or believes itsrecommendations would lead to much more positive outcomes. Then, we examineclinical trajectories simulated with the learned model and policy alongside theactual hospital course to uncover possible modeling issues. To demonstrate thatthis approach yields insights, we apply it to recent work on RL for inpatientsepsis management. We find that a design choice around maximum trajectorylength leads to a model bias towards discharge, that the RL policy preferencefor high vasopressor doses may be linked to small sample sizes, and that themodel has a clinically implausible expectation of discharge without weaning offvasopressors.