CNNs are Globally Optimal Given Multi-Layer Support

  • 2017-12-14 14:21:43
  • Chen Huang, Chen Kong, Simon Lucey
  • 0

Abstract

Stochastic Gradient Descent (SGD) is the central workhorse for trainingmodern CNNs. Although giving impressive empirical performance it can be slow toconverge. In this paper we explore a novel strategy for training a CNN using analternation strategy that offers substantial speedups during training. We makethe following contributions: (i) replace the ReLU non-linearity within a CNNwith positive hard-thresholding, (ii) reinterpret this non-linearity as abinary state vector making the entire CNN linear if the multi-layer support isknown, and (iii) demonstrate that under certain conditions a global optima tothe CNN can be found through local descent. We then employ a novel alternationstrategy (between weights and support) for CNN training that leads tosubstantially faster convergence rates, nice theoretical properties, andachieving state of the art results across large scale datasets (e.g. ImageNet)as well as other standard benchmarks.

 

Quick Read (beta)

loading the full paper ...