WALL-E: An Efficient Reinforcement Learning Research Framework

Abstract

There are two halves to RL systems: experience collection time and policylearning time. For a large number of samples in rollouts, experience collectiontime is the major bottleneck. Thus, it is necessary to speed up the rolloutgeneration time with multi-process architecture support. Our work, dubbedWALL-E, utilizes multiple rollout samplers running in parallel to rapidlygenerate experience. Due to our parallel samplers, we experience not onlyfaster convergence times, but also higher average reward thresholds. Forexample, on the MuJoCo HalfCheetah-v2 task, with $N = 10$ parallel samplerprocesses, we are able to achieve much higher average return than those fromusing only a single process architecture.

Quick Read (beta)

loading the full paper ...