FlowQ: Energy-Guided Flow Policies for Offline Reinforcement Learning

Abstract

The use of guidance to steer sampling toward desired outcomes has been widelyexplored within diffusion models, especially in applications such as image andtrajectory generation. However, incorporating guidance during training remainsrelatively underexplored. In this work, we introduce energy-guided flowmatching, a novel approach that enhances the training of flow models andeliminates the need for guidance at inference time. We learn a conditionalvelocity field corresponding to the flow policy by approximating anenergy-guided probability path as a Gaussian path. Learning guided trajectoriesis appealing for tasks where the target distribution is defined by acombination of data and an energy function, as in reinforcement learning.Diffusion-based policies have recently attracted attention for their expressivepower and ability to capture multi-modal action distributions. Typically, thesepolicies are optimized using weighted objectives or by back-propagatinggradients through actions sampled by the policy. As an alternative, we proposeFlowQ, an offline reinforcement learning algorithm based on energy-guided flowmatching. Our method achieves competitive performance while the policy trainingtime is constant in the number of flow sampling steps.

Quick Read (beta)

loading the full paper ...