Deep reinforcement learning (DRL) has seen several successful applications toprocess control. Common methods rely on a deep neural network structure tomodel the controller or process. With increasingly complicated controlstructures, the closed-loop stability of such methods becomes less clear. Inthis work, we focus on the interpretability of DRL control methods. Inparticular, we view linear fixed-structure controllers as shallow neuralnetworks embedded in the actor-critic framework. PID controllers guide ourdevelopment due to their simplicity and acceptance in industrial practice. Wethen consider input saturation, leading to a simple nonlinear controlstructure. In order to effectively operate within the actuator limits we thenincorporate a tuning parameter for anti-windup compensation. Finally, thesimplicity of the controller allows for straightforward initialization. Thismakes our method inherently stabilizing, both during and after training, andamenable to known operational PID gains.