Generalized Grade-of-Membership Estimation for High-dimensional Locally Dependent Data

  • 2024-12-27 18:51:15
  • Ling Chen, Chengzhu Huang, Yuqi Gu
  • 0

Abstract

This work focuses on the mixed membership models for multivariate categoricaldata widely used for analyzing survey responses and population genetics data.These grade of membership (GoM) models offer rich modeling power but presentsignificant estimation challenges for high-dimensional polytomous data. Popularexisting approaches, such as Bayesian MCMC inference, are not scalable and lacktheoretical guarantees in high-dimensional settings. To address this, we firstobserve that data from this model can be reformulated as a three-way(quasi-)tensor, with many subjects responding to many items with varyingnumbers of categories. We introduce a novel and simple approach that flattensthe three-way quasi-tensor into a "fat" matrix, and then perform a singularvalue decomposition of it to estimate parameters by exploiting the singularsubspace geometry. Our fast spectral method can accommodate a broad range ofdata distributions with arbitrarily locally dependent noise, which we formalizeas the generalized-GoM models. We establish finite-sample entrywise errorbounds for the generalized-GoM model parameters. This is supported by a newsharp two-to-infinity singular subspace perturbation theory for locallydependent and flexibly distributed noise, a contribution of independentinterest. Simulations and applications to data in political surveys, populationgenetics, and single-cell sequencing demonstrate our method's superiorperformance.

 

Quick Read (beta)

loading the full paper ...