Diffusion models have emerged as powerful generative models in thetext-to-image domain. This paper studies their application asobservation-to-action models for imitating human behaviour in sequentialenvironments. Human behaviour is stochastic and multimodal, with structuredcorrelations between action dimensions. Meanwhile, standard modelling choicesin behaviour cloning are limited in their expressiveness and may introduce biasinto the cloned policy. We begin by pointing out the limitations of thesechoices. We then propose that diffusion models are an excellent fit forimitating human behaviour, since they learn an expressive distribution over thejoint action space. We introduce several innovations to make diffusion modelssuitable for sequential environments; designing suitable architectures,investigating the role of guidance, and developing reliable samplingstrategies. Experimentally, diffusion models closely match human demonstrationsin a simulated robotic control task and a modern 3D gaming environment.