Recent years have seen a rise in interest in terms of using machine learning,particularly reinforcement learning (RL), for production scheduling problems ofvarying degrees of complexity. The general approach is to break down thescheduling problem into a Markov Decision Process (MDP), whereupon a simulationimplementing the MDP is used to train an RL agent. Since existing studies relyon (sometimes) complex simulations for which the code is unavailable, theexperiments presented are hard, or, in the case of stochastic environments,impossible to reproduce accurately. Furthermore, there is a vast array of RLdesigns to choose from. To make RL methods widely applicable in productionscheduling and work out their strength for the industry, the standardization ofmodel descriptions - both production setup and RL design - and validationscheme are a prerequisite. Our contribution is threefold: First, we standardizethe description of production setups used in RL studies based on establishednomenclature. Secondly, we classify RL design choices from existingpublications. Lastly, we propose recommendations for a validation schemefocusing on reproducibility and sufficient benchmarking.