Abstract
Reinforcement Learning (RL) has achieved state-of-the-art results in domainssuch as robotics and games. We build on this previous work by applying RLalgorithms to a selection of canonical online stochastic optimization problemswith a range of practical applications: Bin Packing, Newsvendor, and VehicleRouting. While there is a nascent literature that applies RL to these problems,there are no commonly accepted benchmarks which can be used to compare proposedapproaches rigorously in terms of performance, scale, or generalizability. Thispaper aims to fill that gap. For each problem we apply both standard approachesas well as newer RL algorithms and analyze results. In each case, theperformance of the trained RL policy is competitive with or superior to thecorresponding baselines, while not requiring much in the way of domainknowledge. This highlights the potential of RL in real-world dynamic resourceallocation problems.