In recent years, large amounts of electronic health records (EHRs) concerningchronic diseases have been collected to facilitate medical diagnosis. Modelingthe dynamic properties of EHRs related to chronic diseases can be efficientlydone using dynamic treatment regimes (DTRs). While reinforcement learning (RL)is a widely used method for creating DTRs, there is ongoing research indeveloping RL algorithms that can effectively handle large amounts of data. Inthis paper, we present a scalable kernel-based distributed Q-learning algorithmfor generating DTRs. We perform both theoretical assessments and numericalanalysis for the proposed approach. The results demonstrate that our algorithmsignificantly reduces the computational complexity associated with thestate-of-the-art deep reinforcement learning methods, while maintainingcomparable generalization performance in terms of accumulated rewards acrossstages, such as survival time or cumulative survival probability.