Abstract
Diffusion models have seen tremendous success as generative architectures.Recently, they have been shown to be effective at modelling policies foroffline reinforcement learning and imitation learning. We explore usingdiffusion as a model class for the successor state measure (SSM) of a policy.We find that enforcing the Bellman flow constraints leads to a simple Bellmanupdate on the diffusion step distribution.
Quick Read (beta)
loading the full paper ...