Offline Reinforcement Learning at Multiple Frequencies

  • 2022-07-26 18:54:49
  • Kaylee Burns, Tianhe Yu, Chelsea Finn, Karol Hausman
  • 38


Leveraging many sources of offline robot data requires grappling with theheterogeneity of such data. In this paper, we focus on one particular aspect ofheterogeneity: learning from offline data collected at different controlfrequencies. Across labs, the discretization of controllers, sampling rates ofsensors, and demands of a task of interest may differ, giving rise to a mixtureof frequencies in an aggregated dataset. We study how well offlinereinforcement learning (RL) algorithms can accommodate data with a mixture offrequencies during training. We observe that the $Q$-value propagates atdifferent rates for different discretizations, leading to a number of learningchallenges for off-the-shelf offline RL. We present a simple yet effectivesolution that enforces consistency in the rate of $Q$-value updates tostabilize learning. By scaling the value of $N$ in $N$-step returns with thediscretization size, we effectively balance $Q$-value propagation, leading tomore stable convergence. On three simulated robotic control problems, weempirically find that this simple approach outperforms na\"ive mixing by 50% onaverage.


