Policy Transfer with Strategy Optimization

Abstract

Computer simulation provides an automatic and safe way for training roboticcontrol policies to achieve complex tasks such as locomotion. However, a policytrained in simulation usually does not transfer directly to the real hardwaredue to the differences between the two environments. Transfer learning usingdomain randomization is a promising approach, but it usually assumes that thetarget environment is close to the distribution of the training environments,thus relying heavily on accurate system identification. In this paper, wepresent a different approach that leverages domain randomization fortransferring control policies to unknown environments. The key idea that,instead of learning a single policy in the simulation, we simultaneously learna family of policies that exhibit different behaviors. When tested in thetarget environment, we directly search for the best policy in the family basedon the task performance, without the need to identify the dynamic parameters.We evaluate our method on five simulated robotic control problems withdifferent discrepancies in the training and testing environment and demonstratethat our method can overcome larger modeling errors compared to training arobust policy or an adaptive policy.

Quick Read (beta)

loading the full paper ...