From Learning with Partial Information to Bandits: Only Strict Nash Equilibria are Stable

Abstract

In this paper, we examine the Nash equilibrium convergence properties ofno-regret learning in general $N$-player games. Despite the importance andwidespread applications of no-regret algorithms, their long-run behavior inmulti-agent environments is still far from understood, and most of theliterature has focused by necessity on certain, specific classes of games(typically zero-sum or congestion games). Instead of focusing on a fixed classof games, we instead take a structural approach and examine different classesof equilibria in generic games. For concreteness, we focus on the archetypal"follow the regularized leader" (FTRL) class of algorithms, and we consider thefull spectrum of information uncertainty that the players may encounter - fromnoisy, oracle-based feedback, to bandit, payoff-based information. In thisgeneral context, we establish a comprehensive equivalence between the stabilityof a Nash equilibrium and its support: a Nash equilibrium is stable andattracting with arbitrarily high probability if and only if it is strict (i.e.,each equilibrium strategy has a unique best response). This result extendsexisting continuous-time versions of the "folk theorem" of evolutionary gametheory to a bona fide discrete-time learning setting, and provides an importantlink between the literature on multi-armed bandits and the equilibriumrefinement literature.

Quick Read (beta)

loading the full paper ...