Offline Reinforcement Learning at Multiple Frequencies

Abstract

Leveraging many sources of offline robot data requires grappling with theheterogeneity of such data. In this paper, we focus on one particular aspect ofheterogeneity: learning from offline data collected at different controlfrequencies. Across labs, the discretization of controllers, sampling rates ofsensors, and demands of a task of interest may differ, giving rise to a mixtureof frequencies in an aggregated dataset. We study how well offlinereinforcement learning (RL) algorithms can accommodate data with a mixture offrequencies during training. We observe that the $Q$-value propagates atdifferent rates for different discretizations, leading to a number of learningchallenges for off-the-shelf offline RL. We present a simple yet effectivesolution that enforces consistency in the rate of $Q$-value updates tostabilize learning. By scaling the value of $N$ in $N$-step returns with thediscretization size, we effectively balance $Q$-value propagation, leading tomore stable convergence. On three simulated robotic control problems, weempirically find that this simple approach outperforms na\"ive mixing by 50% onaverage.

Quick Read (beta)

loading the full paper ...