Efficient Video Generation on Complex Datasets

Abstract

Generative models of natural images have progressed towards high fidelitysamples by the strong leveraging of scale. We attempt to carry this success tothe field of video modeling by showing that large Generative AdversarialNetworks trained on the complex Kinetics-600 dataset are able to produce videosamples of substantially higher complexity than previous work. Our proposednetwork, Dual Video Discriminator GAN (DVD-GAN), scales to longer and higherresolution videos by leveraging a computationally efficient decomposition ofits discriminator. We evaluate on the related tasks of video synthesis andvideo prediction, and achieve new state of the art Frechet Inception Distanceon prediction for Kinetics-600, as well as state of the art Inception Score forsynthesis on the UCF-101 dataset, alongside establishing a number of strongbaselines on Kinetics-600.

Quick Read (beta)

loading the full paper ...