### Abstract

This study analyzes the Fisher information matrix (FIM) by applyingmean-field theory to deep neural networks with random weights. We theoreticallyfind novel statistics of the FIM, which are universal among a wide class ofdeep networks with any number of layers and various activation functions.Although most of the FIM's eigenvalues are close to zero, the maximumeigenvalue takes on a huge value and the eigenvalue distribution has anextremely long tail. These statistics suggest that the shape of a losslandscape is locally flat in most dimensions, but strongly distorted in theother dimensions. Moreover, our theory of the FIM leads to quantitativeevaluation of learning in deep networks. First, the maximum eigenvalue enablesus to estimate an appropriate size of a learning rate for steepest gradientmethods to converge. Second, the flatness induced by the small eigenvalues isconnected to generalization ability through a norm-based capacity measure.

### Introduction (beta)

None

### Conclusion (beta)

None