Online Saddle Point Problem with Applications to Constrained Online Convex Optimization

Abstract

We study an online saddle point problem where at each iteration a pair ofactions need to be chosen without knowledge of the future (convex-concave)payoff functions. The objective is to minimize the gap between the cumulativepayoffs and the saddle point value of the aggregate payoff function, which wemeasure using a metric called "SP-regret". The problem generalizes the onlineconvex optimization framework and can be interpreted as finding the Nashequilibrium for the aggregate of a sequence of two-player zero-sum games. Wepropose an algorithm that achieves $\tilde{O}(\sqrt{T})$ SP-regret in thegeneral case, and $O(\log T)$ SP-regret for the strongly convex-concave case.We then consider a constrained online convex optimization problem motivated bya variety of applications in dynamic pricing, auctions, and crowdsourcing. Werelate this problem to an online saddle point problem and establish$O(\sqrt{T})$ regret using a primal-dual algorithm.

Quick Read (beta)

loading the full paper ...