Abstract
We present Q-chunking, a simple yet effective recipe for improvingreinforcement learning (RL) algorithms for long-horizon, sparse-reward tasks.Our recipe is designed for the offline-to-online RL setting, where the goal isto leverage an offline prior dataset to maximize the sample-efficiency ofonline learning. Effective exploration and sample-efficient learning remaincentral challenges in this setting, as it is not obvious how the offline datashould be utilized to acquire a good exploratory policy. Our key insight isthat action chunking, a technique popularized in imitation learning wheresequences of future actions are predicted rather than a single action at eachtimestep, can be applied to temporal difference (TD)-based RL methods tomitigate the exploration challenge. Q-chunking adopts action chunking bydirectly running RL in a 'chunked' action space, enabling the agent to (1)leverage temporally consistent behaviors from offline data for more effectiveonline exploration and (2) use unbiased $n$-step backups for more stable andefficient TD learning. Our experimental results demonstrate that Q-chunkingexhibits strong offline performance and online sample efficiency, outperformingprior best offline-to-online methods on a range of long-horizon, sparse-rewardmanipulation tasks.