Folded Recurrent Neural Networks for Future Video Prediction

  • 2018-03-16 15:15:24
  • Marc Oliu, Javier Selva, Sergio Escalera
  • 1

Abstract

Future video prediction is an ill-posed Computer Vision problem that recentlyreceived much attention. Its main challenges are the high variability in videocontent, the propagation of errors through time, and the non-specificity of thefuture frames: given a sequence of past frames there is a continuousdistribution of possible futures. This work introduces bijective GatedRecurrent Units, a double mapping between the input and output of a GRU layer.This allows for recurrent auto-encoders with state sharing between encoder anddecoder, stratifying the sequence representation and helping to preventcapacity problems. We show how with this topology only the encoder or decoderneeds to be applied for input encoding and prediction, respectively. Thisreduces the computational cost and avoids re-encoding the predictions whengenerating a sequence of frames, mitigating the propagation of errors.Furthermore, it is possible to remove layers from an already trained model,giving an insight to the role performed by each layer and making the model moreexplainable. We evaluate our approach on three video datasets, outperformingstate of the art prediction results on MMNIST and UCF101, and obtainingcompetitive results on KTH with 2 and 3 times less memory usage andcomputational cost than the best scored approach.

 

Quick Read (beta)

loading the full paper ...