Learning Distributions Generated by One-Layer ReLU Networks

  • 2019-09-19 16:04:46
  • Shanshan Wu, Alexandros G. Dimakis, Sujay Sanghavi
  • 0

Abstract

We consider the problem of estimating the parameters of a $d$-dimensionalrectified Gaussian distribution from i.i.d. samples. A rectified Gaussiandistribution is defined by passing a standard Gaussian distribution through aone-layer ReLU neural network. We give a simple algorithm to estimate theparameters (i.e., the weight matrix and bias vector of the ReLU neural network)up to an error $\epsilon||W||_F$ using $\tilde{O}(1/\epsilon^2)$ samples and$\tilde{O}(d^2/\epsilon^2)$ time (log factors are ignored for simplicity). Thisimplies that we can estimate the distribution up to $\epsilon$ in totalvariation distance using $\tilde{O}(\kappa^2d^2/\epsilon^2)$ samples, where$\kappa$ is the condition number of the covariance matrix. Our only assumptionis that the bias vector is non-negative. Without this non-negativityassumption, we show that estimating the bias vector within any error requiresthe number of samples at least exponential in the infinity norm of the biasvector. Our algorithm is based on the key observation that vector norms andpairwise angles can be estimated separately. We use a recent result on learningfrom truncated samples. We also prove two sample complexity lower bounds:$\Omega(1/\epsilon^2)$ samples are required to estimate the parameters up toerror $\epsilon$, while $\Omega(d/\epsilon^2)$ samples are necessary toestimate the distribution up to $\epsilon$ in total variation distance. Thefirst lower bound implies that our algorithm is optimal for parameterestimation. Finally, we show an interesting connection between learning atwo-layer generative model and non-negative matrix factorization. Experimentalresults are provided to support our analysis.

 

Quick Read (beta)

loading the full paper ...