Sequence Compression Speeds Up Credit Assignment in Reinforcement Learning

  • 2024-05-06 22:49:29
  • Aditya A. Ramesh, Kenny Young, Louis Kirsch, J├╝rgen Schmidhuber
  • 0


Temporal credit assignment in reinforcement learning is challenging due todelayed and stochastic outcomes. Monte Carlo targets can bridge long delaysbetween action and consequence but lead to high-variance targets due tostochasticity. Temporal difference (TD) learning uses bootstrapping to overcomevariance but introduces a bias that can only be corrected through manyiterations. TD($\lambda$) provides a mechanism to navigate this bias-variancetradeoff smoothly. Appropriately selecting $\lambda$ can significantly improveperformance. Here, we propose Chunked-TD, which uses predicted probabilities oftransitions from a model for computing $\lambda$-return targets. Unlike othermodel-based solutions to credit assignment, Chunked-TD is less vulnerable tomodel inaccuracies. Our approach is motivated by the principle of historycompression and 'chunks' trajectories for conventional TD learning. Chunkingwith learned world models compresses near-deterministic regions of theenvironment-policy interaction to speed up credit assignment while stillbootstrapping when necessary. We propose algorithms that can be implementedonline and show that they solve some problems much faster than conventionalTD($\lambda$).


Quick Read (beta)

loading the full paper ...