Trellis Networks for Sequence Modeling

  • 2018-10-15 20:50:05
  • Shaojie Bai, J. Zico Kolter, Vladlen Koltun
  • 22

Abstract

We present trellis networks, a new architecture for sequence modeling. On theone hand, a trellis network is a temporal convolutional network with specialstructure, characterized by weight tying across depth and direct injection ofthe input into deep layers. On the other hand, we show that truncated recurrentnetworks are equivalent to trellis networks with special sparsity structure intheir weight matrices. Thus trellis networks with general weight matricesgeneralize truncated recurrent networks. We leverage these connections todesign high-performing trellis networks that absorb structural and algorithmicelements from both recurrent and convolutional models. Experiments demonstratethat trellis networks outperform the current state of the art on a variety ofchallenging benchmarks, including word-level language modeling on Penn Treebankand WikiText-103, character-level language modeling on Penn Treebank, andstress tests designed to evaluate long-term memory retention. The code isavailable at https://github.com/locuslab/trellisnet .

 

Quick Read (beta)

loading the full paper ...