Abstract
We study the task of high-dimensional entangled mean estimation in thesubset-of-signals model. Specifically, given $N$ independent random points$x_1,\ldots,x_N$ in $\mathbb{R}^D$ and a parameter $\alpha \in (0, 1)$ suchthat each $x_i$ is drawn from a Gaussian with mean $\mu$ and unknowncovariance, and an unknown $\alpha$-fraction of the points haveidentity-bounded covariances, the goal is to estimate the common mean $\mu$.The one-dimensional version of this task has received significant attention intheoretical computer science and statistics over the past decades. Recent work[LY20; CV24] has given near-optimal upper and lower bounds for theone-dimensional setting. On the other hand, our understanding of even theinformation-theoretic aspects of the multivariate setting has remained limited. In this work, we design a computationally efficient algorithm achieving aninformation-theoretically near-optimal error. Specifically, we show that theoptimal error (up to polylogarithmic factors) is $f(\alpha,N) + \sqrt{D/(\alphaN)}$, where the term $f(\alpha,N)$ is the error of the one-dimensional problemand the second term is the sub-Gaussian error rate. Our algorithmic approachemploys an iterative refinement strategy, whereby we progressively learn moreaccurate approximations $\hat \mu$ to $\mu$. This is achieved via a novelrejection sampling procedure that removes points significantly deviating from$\hat \mu$, as an attempt to filter out unusually noisy samples. A complicationthat arises is that rejection sampling introduces bias in the distribution ofthe remaining points. To address this issue, we perform a careful analysis ofthe bias, develop an iterative dimension-reduction strategy, and employ a novelsubroutine inspired by list-decodable learning that leverages theone-dimensional result.