An Optimal Multistage Stochastic Gradient Method for Minimax Problems

Abstract

In this paper, we study the minimax optimization problem in the smooth andstrongly convex-strongly concave setting when we have access to noisy estimatesof gradients. In particular, we first analyze the stochastic Gradient DescentAscent (GDA) method with constant stepsize, and show that it converges to aneighborhood of the solution of the minimax problem. We further provide tightbounds on the convergence rate and the size of this neighborhood. Next, wepropose a multistage variant of stochastic GDA (M-GDA) that runs in multiplestages with a particular learning rate decay schedule and converges to theexact solution of the minimax problem. We show M-GDA achieves the lower boundsin terms of noise dependence without any assumptions on the knowledge of noisecharacteristics. We also show that M-GDA obtains a linear decay rate withrespect to the error's dependence on the initial error, although the dependenceon condition number is suboptimal. In order to improve this dependence, weapply the multistage machinery to the stochastic Optimistic Gradient DescentAscent (OGDA) algorithm and propose the M-OGDA algorithm which also achievesthe optimal linear decay rate with respect to the initial error. To the best ofour knowledge, this method is the first to simultaneously achieve the bestdependence on noise characteristic as well as the initial error and conditionnumber.

Quick Read (beta)

loading the full paper ...