Incomplete U-Statistics of Equireplicate Designs: Berry-Esseen Bound and Efficient Construction

  • 2025-10-23 17:21:54
  • Cesare Miglioli, Jordan Awan
  • 0

Abstract

U-statistics are a fundamental class of estimators that generalize the samplemean and underpin much of nonparametric statistics. Although extensivelystudied in both statistics and probability, key challenges remain: their highcomputational cost - addressed partly through incomplete U-statistics - andtheir non-standard asymptotic behavior in the degenerate case, which typicallyrequires resampling methods for hypothesis testing. This paper presents a novelperspective on U-statistics, grounded in hypergraph theory and combinatorialdesigns. Our approach bypasses the traditional Hoeffding decomposition, themain analytical tool in this literature but one highly sensitive to degeneracy.By characterizing the dependence structure of a U-statistic, we derive aBerry-Esseen bound that applies to all incomplete U-statistics of deterministicdesigns, yielding conditions under which Gaussian limiting distributions can beestablished even in the degenerate case and when the order diverges. We alsointroduce efficient algorithms to construct incomplete U-statistics ofequireplicate designs, a subclass of deterministic designs that, in certaincases, achieve minimum variance. Finally, we apply our framework tokernel-based tests that use Maximum Mean Discrepancy (MMD) and Hilbert-SchmidtIndependence Criterion. In a real data example with CIFAR-10, ourpermutation-free MMD test delivers substantial computational gains whileretaining power and type I error control.

 

Quick Read (beta)

loading the full paper ...