Streaming PCA and Subspace Tracking: The Missing Data Case

  • 2018-06-12 15:32:17
  • Laura Balzano, Yuejie Chi, Yue M. Lu
  • 3


For many modern applications in science and engineering, data are collectedin a streaming fashion carrying time-varying information, and practitionersneed to process them with a limited amount of memory and computationalresources in a timely manner for decision making. This often is coupled withthe missing data problem, such that only a small fraction of data attributesare observed. These complications impose significant, and unconventional,constraints on the problem of streaming Principal Component Analysis (PCA) andsubspace tracking, which is an essential building block for many inferencetasks in signal processing and machine learning. This survey article reviews avariety of classical and recent algorithms for solving this problem with lowcomputational and memory complexities, particularly those applicable in the bigdata regime with missing data. We illustrate that streaming PCA and subspacetracking algorithms can be understood through algebraic and geometricperspectives, and they need to be adjusted carefully to handle missing data.Both asymptotic and non-asymptotic convergence guarantees are reviewed.Finally, we benchmark the performance of several competitive algorithms in thepresence of missing data for both well-conditioned and ill-conditioned systems.


Introduction (beta)



Conclusion (beta)