Analyzing Multi-Stage Loss Curve: Plateau and Descent Mechanisms in Neural Networks

  • 2024-10-26 09:16:00
  • Zheng-An Chen, Tao Luo, GuiHong Wang
  • 0

Abstract

The multi-stage phenomenon in the training loss curves of neural networks hasbeen widely observed, reflecting the non-linearity and complexity inherent inthe training process. In this work, we investigate the training dynamics ofneural networks (NNs), with particular emphasis on the small initializationregime and identify three distinct stages observed in the loss curve duringtraining: initial plateau stage, initial descent stage, and secondary plateaustage. Through rigorous analysis, we reveal the underlying challenges causingslow training during the plateau stages. Building on existing work, we providea more detailed proof for the initial plateau. This is followed by acomprehensive analysis of the dynamics in the descent stage. Furthermore, weexplore the mechanisms that enable the network to overcome the prolongedsecondary plateau stage, supported by both experimental evidence and heuristicreasoning. Finally, to better understand the relationship between globaltraining trends and local parameter adjustments, we employ the Wassersteindistance to capture the microscopic evolution of weight amplitude distribution.

 

Quick Read (beta)

loading the full paper ...