BADM: Batch ADMM for Deep Learning

Abstract

Stochastic gradient descent-based algorithms are widely used for trainingdeep neural networks but often suffer from slow convergence. To address thechallenge, we leverage the framework of the alternating direction method ofmultipliers (ADMM) to develop a novel data-driven algorithm, called batch ADMM(BADM). The fundamental idea of the proposed algorithm is to split the trainingdata into batches, which is further divided into sub-batches where primal anddual variables are updated to generate global parameters through aggregation.We evaluate the performance of BADM across various deep learning tasks,including graph modelling, computer vision, image generation, and naturallanguage processing. Extensive numerical experiments demonstrate that BADMachieves faster convergence and superior testing accuracy compared to otherstate-of-the-art optimizers.

Quick Read (beta)

loading the full paper ...