The NeurIPS 2020 Procgen Competition was designed as a centralized benchmarkwith clearly defined tasks for measuring Sample Efficiency and Generalizationin Reinforcement Learning. Generalization remains one of the most fundamentalchallenges in deep reinforcement learning, and yet we do not have enoughbenchmarks to measure the progress of the community on Generalization inReinforcement Learning. We present the design of a centralized benchmark forReinforcement Learning which can help measure Sample Efficiency andGeneralization in Reinforcement Learning by doing end to end evaluation of thetraining and rollout phases of thousands of user submitted code bases in ascalable way. We designed the benchmark on top of the already existing ProcgenBenchmark by defining clear tasks and standardizing the end to end evaluationsetups. The design aims to maximize the flexibility available for researcherswho wish to design future iterations of such benchmarks, and yet imposesnecessary practical constraints to allow for a system like this to scale. Thispaper presents the competition setup and the details and analysis of the topsolutions identified through this setup in context of 2020 iteration of thecompetition at NeurIPS.