Overcoming Slow Decision Frequencies in Continuous Control: Model-Based Sequence Reinforcement Learning for Model-Free Control

Abstract

Reinforcement learning (RL) is rapidly reaching and surpassing human-levelcontrol capabilities. However, state-of-the-art RL algorithms often requiretimesteps and reaction times significantly faster than human capabilities,which is impractical in real-world settings and typically necessitatesspecialized hardware. We introduce Sequence Reinforcement Learning (SRL), an RLalgorithm designed to produce a sequence of actions for a given input state,enabling effective control at lower decision frequencies. SRL addresses thechallenges of learning action sequences by employing both a model and anactor-critic architecture operating at different temporal scales. We propose a"temporal recall" mechanism, where the critic uses the model to estimateintermediate states between primitive actions, providing a learning signal foreach individual action within the sequence. Once training is complete, theactor can generate action sequences independently of the model, achievingmodel-free control at a slower frequency. We evaluate SRL on a suite ofcontinuous control tasks, demonstrating that it achieves performance comparableto state-of-the-art algorithms while significantly reducing actor samplecomplexity. To better assess performance across varying decision frequencies,we introduce the Frequency-Averaged Score (FAS) metric. Our results show thatSRL significantly outperforms traditional RL algorithms in terms of FAS, makingit particularly suitable for applications requiring variable decisionfrequencies. Furthermore, we compare SRL with model-based online planning,showing that SRL achieves comparable FAS while leveraging the same model duringtraining that online planners use for planning. Lastly, we highlight thebiological relevance of SRL, showing that it replicates the "action chunking"behavior observed in the basal ganglia, offering insights into brain-inspiredcontrol mechanisms.

Quick Read (beta)

loading the full paper ...