Online Double Oracle

  • 2022-05-16 17:43:15
  • Le Cong Dinh, Yaodong Yang, Stephen McAleer, Zheng Tian, Nicolas Perez Nieves, Oliver Slumbers, David Henry Mguni, Haitham Bou Ammar, Jun Wang
  • 0

Abstract

Solving strategic games with huge action space is a critical yetunder-explored topic in economics, operations research and artificialintelligence. This paper proposes new learning algorithms for solvingtwo-player zero-sum normal-form games where the number of pure strategies isprohibitively large. Specifically, we combine no-regret analysis from onlinelearning with Double Oracle (DO) methods from game theory. Our method --\emph{Online Double Oracle (ODO)} -- is provably convergent to a Nashequilibrium (NE). Most importantly, unlike normal DO methods, ODO is\emph{rationale} in the sense that each agent in ODO can exploit strategicadversary with a regret bound of $\mathcal{O}(\sqrt{T k \log(k)})$ where $k$ isnot the total number of pure strategies, but rather the size of \emph{effectivestrategy set} that is linearly dependent on the support size of the NE. On tensof different real-world games, ODO outperforms DO, PSRO methods, and no-regretalgorithms such as Multiplicative Weight Update by a significant margin, bothin terms of convergence rate to a NE and average payoff against strategicadversaries.

 

Quick Read (beta)

loading the full paper ...