Abstract
Reproducibility in reinforcement learning is challenging: uncontrolledstochasticity from many sources, such as the learning algorithm, the learnedpolicy, and the environment itself have led researchers to report theperformance of learned agents using aggregate metrics of performance overmultiple random seeds for a single environment. Unfortunately, there are stillpernicious sources of variability in reinforcement learning agents that makereporting common summary statistics an unsound metric for performance. Ourexperiments demonstrate the variability of common agents used in the popularOpenAI Baselines repository. We make the case for reporting post-training agentperformance as a distribution, rather than a point estimate.
Quick Read (beta)
loading the full paper ...