Federated PCA with Adaptive Rank Estimation

Abstract

In many online machine learning and data science tasks such as datasummarisation and feature compression, $d$-dimensional vectors are usuallydistributed across a large number of clients in a decentralised network andcollected in a streaming fashion. This is increasingly common in modernapplications due to the sheer volume of data generated and the clients'constrained resources. In this setting, some clients are required to compute anupdate to a centralised target model independently using local data while otherclients aggregate these updates with a low-complexity merging algorithm.However, some clients with limited storage might not be able to store all ofthe data samples if $d$ is large, nor compute procedures requiring at least$\Omega(d^2)$ storage-complexity such as Principal Component Analysis, SubspaceTracking, or general Feature Correlation. In this work, we present a novelfederated algorithm for PCA that is able to adaptively estimate the rank $r$ ofthe dataset and compute its $r$ leading principal components when only $O(dr)$memory is available. This inherent adaptability implies that $r$ does not haveto be supplied as a fixed hyper-parameter which is beneficial when theunderlying data distribution is not known in advance, such as in a streamingsetting. Numerical simulations show that, while using limited-memory, ouralgorithm exhibits state-of-the-art performance that closely matches oroutperforms traditional non-federated algorithms, and in the absence ofcommunication latency, it exhibits attractive horizontal scalability.

Quick Read (beta)

loading the full paper ...