Variance Reduction for Reinforcement Learning in Input-Driven Environments

Abstract

We consider reinforcement learning in input-driven environments, where anexogenous, stochastic input process affects the dynamics of the system. Inputprocesses arise in many applications, including queuing systems, roboticscontrol with disturbances, and object tracking. Since the state dynamics andrewards depend on the input process, the state alone provides limitedinformation for the expected future returns. Therefore, policy gradient methodswith standard state-dependent baselines suffer high variance during training.We derive a bias-free, input-dependent baseline to reduce this variance, andanalytically show its benefits over state-dependent baselines. We then proposea meta-learning approach to overcome the complexity of learning a baseline thatdepends on a long sequence of inputs. Our experimental results show that acrossenvironments from queuing systems, computer networks, and MuJoCo roboticlocomotion, input-dependent baselines consistently improve training stabilityand result in better eventual policies.

Quick Read (beta)

loading the full paper ...