End-to-end delay is a critical attribute of quality of service (QoS) inapplication domains such as cloud computing and computer networks. This metricis particularly important in tandem service systems, where the end-to-endservice is provided through a chain of services. Service-rate control is acommon mechanism for providing QoS guarantees in service systems. In thispaper, we introduce a reinforcement learning-based (RL-based) service-ratecontroller that provides probabilistic upper-bounds on the end-to-end delay ofthe system, while preventing the overuse of service resources. In order to havea general framework, we use queueing theory to model the service systems.However, we adopt an RL-based approach to avoid the limitations ofqueueing-theoretic methods. In particular, we use Deep Deterministic PolicyGradient (DDPG) to learn the service rates (action) as a function of the queuelengths (state) in tandem service systems. In contrast to existing RL-basedmethods that quantify their performance by the achieved overall reward, whichcould be hard to interpret or even misleading, our proposed controller providesexplicit probabilistic guarantees on the end-to-end delay of the system. Theevaluations are presented for a tandem queueing system with non-exponentialinter-arrival and service times, the results of which validate our controller'scapability in meeting QoS constraints.