Improved asynchronous parallel optimization analysis for stochastic incremental methods

Abstract

As datasets continue to increase in size and multi-core computerarchitectures are developed, asynchronous parallel optimization algorithmsbecome more and more essential to the field of Machine Learning. Unfortunately,conducting the theoretical analysis asynchronous methods is difficult, notablydue to the introduction of delay and inconsistency in inherently sequentialalgorithms. Handling these issues often requires resorting to simplifying butunrealistic assumptions. Through a novel perspective, we revisit and clarify asubtle but important technical issue present in a large fraction of the recentconvergence rate proofs for asynchronous parallel optimization algorithms, andpropose a simplification of the recently introduced "perturbed iterate"framework that resolves it. We demonstrate the usefulness of our new frameworkby analyzing three distinct asynchronous parallel incremental optimizationalgorithms: Hogwild (asynchronous SGD), KROMAGNON (asynchronous SVRG) andASAGA, a novel asynchronous parallel version of the incremental gradientalgorithm SAGA that enjoys fast linear convergence rates. We are able to bothremove problematic assumptions and obtain better theoretical results. Notably,we prove that ASAGA and KROMAGNON can obtain a theoretical linear speedup onmulti-core systems even without sparsity assumptions. We present results of animplementation on a 40-core architecture illustrating the practical speedups aswell as the hardware overhead. Finally, we investigate the overlap constant, anill-understood but central quantity for the theoretical analysis ofasynchronous parallel algorithms. We find that it encompasses much morecomplexity than suggested in previous work, and often is order-of-magnitudebigger than traditionally thought.

Quick Read (beta)

loading the full paper ...