Abstract
Given $n$ i.i.d. random matrices $A_i \in \mathbb{R}^{d \times d}$ that sharea common expectation $\Sigma$, the objective of Differentially PrivateStochastic PCA is to identify a subspace of dimension $k$ that captures thelargest variance directions of $\Sigma$, while preserving differential privacy(DP) of each individual $A_i$. Existing methods either (i) require the samplesize $n$ to scale super-linearly with dimension $d$, even under Gaussianassumptions on the $A_i$, or (ii) introduce excessive noise for DP even whenthe intrinsic randomness within $A_i$ is small. Liu et al. (2022a) addressedthese issues for sub-Gaussian data but only for estimating the top eigenvector($k=1$) using their algorithm DP-PCA. We propose the first algorithm capable ofestimating the top $k$ eigenvectors for arbitrary $k \leq d$, whilst overcomingboth limitations above. For $k=1$ our algorithm matches the utility guaranteesof DP-PCA, achieving near-optimal statistical error even when $n =\tilde{\!O}(d)$. We further provide a lower bound for general $k > 1$, matchingour upper bound up to a factor of $k$, and experimentally demonstrate theadvantages of our algorithm over comparable baselines.