Learning Robust and Adaptive Real-World Continuous Control Using Simulation and Transfer Learning

Abstract

We use model-free reinforcement learning, extensive simulation, and transferlearning to develop a continuous control algorithm that has good zero-shotperformance in a real physical environment. We train a simulated agent to actoptimally across a set of similar environments, each with dynamics drawn from aprior distribution. We propose that the agent is able to adjust its actionsalmost immediately, based on small set of observations. This robust andadaptive behavior is enabled by using a policy gradient algorithm with an LongShort Term Memory (LSTM) function approximation. Finally, we train an agent tonavigate a two-dimensional environment with uncertain dynamics and noisyobservations. We demonstrate that this agent has good zero-shot performance ina real physical environment. Our preliminary results indicate that the agent isable to infer the environmental dynamics after only a few timesteps, and adjustits actions accordingly.

Quick Read (beta)

loading the full paper ...