With the growing prevalence of smart grid technology, short-term loadforecasting (STLF) becomes particularly important in power system operations.There is a large collection of methods developed for STLF, but selecting asuitable method under varying conditions is still challenging. This paperdevelops a novel reinforcement learning based dynamic model selection (DMS)method for STLF. A forecasting model pool is first built, including tenstate-of-the-art machine learning based forecasting models. Then a Q-learningagent learns the optimal policy of selecting the best forecasting model for thenext time step, based on the model performance. The optimal DMS policy isapplied to select the best model at each time step with a moving window.Numerical simulations on two-year load and weather data show that theQ-learning algorithm converges fast, resulting in effective and efficient DMS.The developed STLF model with Q-learning based DMS improves the forecastingaccuracy by approximately 50%, compared to the state-of-the-art machinelearning based STLF models.