A Statistical Investigation of Long Memory in Language and Music

Abstract

Representation and learning of long-range dependencies is a central challengeconfronted in modern applications of machine learning to sequence data. Yetdespite the prominence of this issue, the basic problem of measuring long-rangedependence, either in a given data source or as represented in a trained deepmodel, remains largely limited to heuristic tools. We contribute a statisticalframework for investigating long-range dependence in current applications ofsequence modeling, drawing on the statistical theory of long memory stochasticprocesses. By analogy with their linear predecessors in the time seriesliterature, we identify recurrent neural networks (RNNs) as nonlinear processesthat simultaneously attempt to learn both a feature representation for and thelong-range dependency structure of an input sequence. We derive testableimplications concerning the relationship between long memory in real-world dataand its learned representation in a deep network architecture, which areexplored through a semiparametric framework adapted to the high-dimensionalsetting. We establish the validity of statistical inference for a simpleestimator, which yields a decision rule for long memory in RNNs. Experimentsillustrating this statistical framework confirm the presence of long memory ina diverse collection of natural language and music data, but show that avariety of RNN architectures fail to capture this property even after trainingto benchmark accuracy in a language model.

Quick Read (beta)

loading the full paper ...