Dictionary-Learning-Based Data Pruning for System Identification

Abstract

System identification is normally involved in augmenting time series data bytime shifting and nonlinearisation (e.g., polynomial basis), both of whichintroduce redundancy in features and samples. Many research works focus onreducing redundancy feature-wise, while less attention is paid to sample-wiseredundancy. This paper proposes a novel data pruning method, called mini-batchFastCan, to reduce sample-wise redundancy based on dictionary learning. Timeseries data is represented by some representative samples, called atoms, viadictionary learning. The useful samples are selected based on their correlationwith the atoms. The method is tested on one simulated dataset and two benchmarkdatasets. The R-squared between the coefficients of models trained on the fulldatasets and the coefficients of models trained on pruned datasets is adoptedto evaluate the performance of data pruning methods. It is found that theproposed method significantly outperforms the random pruning method.

Quick Read (beta)

loading the full paper ...