An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

  • 2018-03-04 00:20:29
  • Shaojie Bai, J. Zico Kolter, Vladlen Koltun
  • 43


For most deep learning practitioners, sequence modeling is synonymous withrecurrent networks. Yet recent results indicate that convolutionalarchitectures can outperform recurrent networks on tasks such as audiosynthesis and machine translation. Given a new sequence modeling task ordataset, which architecture should one use? We conduct a systematic evaluationof generic convolutional and recurrent architectures for sequence modeling. Themodels are evaluated across a broad range of standard tasks that are commonlyused to benchmark recurrent networks. Our results indicate that a simpleconvolutional architecture outperforms canonical recurrent networks such asLSTMs across a diverse range of tasks and datasets, while demonstrating longereffective memory. We conclude that the common association between sequencemodeling and recurrent networks should be reconsidered, and convolutionalnetworks should be regarded as a natural starting point for sequence modelingtasks.


Introduction (beta)



Conclusion (beta)