Overcoming Slow Decision Frequencies in Continuous Control: Model-Based Sequence Reinforcement Learning for Model-Free Control

Abstract

Reinforcement learning (RL) is rapidly reaching and surpassing human-levelcontrol capabilities. However, state-of-the-art RL algorithms often requiretimesteps and reaction times significantly faster than human capabilities,which is impractical in real-world settings and typically necessitatesspecialized hardware. Such speeds are difficult to achieve in the real worldand often requires specialized hardware. We introduce Sequence ReinforcementLearning (SRL), an RL algorithm designed to produce a sequence of actions for agiven input state, enabling effective control at lower decision frequencies.SRL addresses the challenges of learning action sequences by employing both amodel and an actor-critic architecture operating at different temporal scales.We propose a "temporal recall" mechanism, where the critic uses the model toestimate intermediate states between primitive actions, providing a learningsignal for each individual action within the sequence. Once training iscomplete, the actor can generate action sequences independently of the model,achieving model-free control at a slower frequency. We evaluate SRL on a suiteof continuous control tasks, demonstrating that it achieves performancecomparable to state-of-the-art algorithms while significantly reducing actorsample complexity. To better assess performance across varying decisionfrequencies, we introduce the Frequency-Averaged Score (FAS) metric. Ourresults show that SRL significantly outperforms traditional RL algorithms interms of FAS, making it particularly suitable for applications requiringvariable decision frequencies. Additionally, we compare SRL with model-basedonline planning, showing that SRL achieves superior FAS while leveraging thesame model during training that online planners use for planning.

Quick Read (beta)

loading the full paper ...