Conditional Normalizing Flows (CNFs) are flexible generative models capableof representing complicated distributions with high dimensionality and largeinterdimensional correlations, making them appealing for structured outputlearning. Their effectiveness in modelling multivariates spatio-temporalstructured data has yet to be completely investigated. We propose MotionFlow asa novel normalizing flows approach that autoregressively conditions the outputdistributions on the spatio-temporal input features. It combines deterministicand stochastic representations with CNFs to create a probabilistic neuralgenerative approach that can model the variability seen in high dimensionalstructured spatio-temporal data. We specifically propose to use conditionalpriors to factorize the latent space for the time dependent modeling. We alsoexploit the use of masked convolutions as autoregressive conditionals in CNFs.As a result, our method is able to define arbitrarily expressive outputprobability distributions under temporal dynamics in multivariate predictiontasks. We apply our method to different tasks, including trajectory prediction,motion prediction, time series forecasting, and binary segmentation, anddemonstrate that our model is able to leverage normalizing flows to learncomplicated time dependent conditional distributions.