Embedding and learning with signatures

Abstract

Sequential and temporal data arise in many fields of research, such asquantitative finance, medicine, or computer vision. The present article isconcerned with a novel approach for sequential learning, called the signaturemethod, and rooted in rough path theory. Its basic principle is to representmultidimensional paths by a graded feature set of their iterated integrals,called the signature. This approach relies critically on an embeddingprinciple, which consists in representing discretely sampled data as paths,i.e., functions from $[0,1]$ to $R^d$. After a survey of machine learningmethodologies for signatures, we investigate the influence of embeddings onprediction accuracy with an in-depth study of three recent and challengingdatasets. We show that a specific embedding, called lead-lag, is systematicallybetter, whatever the dataset or algorithm used. Moreover, we emphasize throughan empirical study that computing signatures over the whole path domain doesnot lead to a loss of local information. We conclude that, with a goodembedding, the signature combined with a simple algorithm achieves resultscompetitive with state-of-the-art, domain-specific approaches.

Quick Read (beta)

loading the full paper ...