Abstract
Offline reinforcement learning often requires a quality dataset that we cantrain a policy on. However, in many situations, it is not possible to get sucha dataset, nor is it easy to train a policy to perform well in the actualenvironment given the offline data. We propose using data distillation to trainand distill a better dataset which can then be used for training a betterpolicy model. We show that our method is able to synthesize a dataset where amodel trained on it achieves similar performance to a model trained on the fulldataset or a model trained using percentile behavioral cloning. Our projectsite is available at https://datasetdistillation4rl.github.io. We also provideour implementation at this GitHub repository:https://github.com/ggflow123/DDRL.