Improving Offline Reinforcement Learning with Inaccurate Simulators

Abstract

Offline reinforcement learning (RL) provides a promising approach to avoidcostly online interaction with the real environment. However, the performanceof offline RL highly depends on the quality of the datasets, which may causeextrapolation error in the learning process. In many robotic applications, aninaccurate simulator is often available. However, the data directly collectedfrom the inaccurate simulator cannot be directly used in offline RL due to thewell-known exploration-exploitation dilemma and the dynamic gap betweeninaccurate simulation and the real environment. To address these issues, wepropose a novel approach to combine the offline dataset and the inaccuratesimulation data in a better manner. Specifically, we pre-train a generativeadversarial network (GAN) model to fit the state distribution of the offlinedataset. Given this, we collect data from the inaccurate simulator startingfrom the distribution provided by the generator and reweight the simulated datausing the discriminator. Our experimental results in the D4RL benchmark and areal-world manipulation task confirm that our method can benefit more from bothinaccurate simulator and limited offline datasets to achieve better performancethan the state-of-the-art methods.

Quick Read (beta)

loading the full paper ...