Context plays a significant role in the generation of motion for dynamicagents in interactive environments. This work proposes a modular method thatutilises a model of the environment to aid motion prediction of tracked agents.This paper shows that modelling the spatial and dynamic aspects of a givenenvironment alongside the local per agent behaviour results in more accurateand informed long-term motion prediction. Further, we observe that thisdecoupling of dynamics and environment models allows for better generalisationto unseen environments, requiring that only a spatial representation of a newenvironment be learned. We highlight the model's prediction capability using abenchmark pedestrian tracking problem and by tracking a robot arm performing atabletop manipulation task. The proposed approach allows for robust and dataefficient forward modelling, and relaxes the need for full model re-training innew environments. We evaluate this through an ablation study which shows betterperformance gain when decoupling representation modules in addition to improvedgeneralisation on tasks with dynamics unseen at training time.