New Potential-Based Bounds for Prediction with Expert Advice

  • 2020-06-29 16:53:26
  • Vladimir A. Kobzar, Robert V. Kohn, Zhilei Wang
  • 0

Abstract

This work addresses the classic machine learning problem of online predictionwith expert advice. We consider the finite-horizon version of this zero-sum,two-person game. Using verification arguments from optimal control theory, weview the task of finding better lower and upper bounds on the value of the game(regret) as the problem of finding better sub- and supersolutions of certainpartial differential equations (PDEs). These sub- and supersolutions serve asthe potentials for player and adversary strategies, which lead to thecorresponding bounds. To get explicit bounds, we use closed-form solutions ofspecific PDEs. Our bounds hold for any given number of experts and horizon; incertain regimes (which we identify) they improve upon the previous state of theart. For two and three experts, our bounds provide the optimal leading orderterm.

 

Quick Read (beta)

[

Vladimir A. Kobzar [email protected]
Center for Data Science, New York University, 60 Fifth Ave., New York, New York \ANDRobert V. Kohn [email protected]
   Zhilei Wang [email protected]
Courant Institute of Mathematical Sciences, New York University, 251 Mercer St., New York, New York
Abstract

This work addresses the classic machine learning problem of online prediction with expert advice. We consider the finite-horizon version of this zero-sum, two-person game. Using verification arguments from optimal control theory, we view the task of finding better lower and upper bounds on the value of the game (regret) as the problem of finding better sub- and supersolutions of certain partial differential equations (PDEs). These sub- and supersolutions serve as the potentials for player and adversary strategies, which lead to the corresponding bounds. To get explicit bounds, we use closed-form solutions of specific PDEs. Our bounds hold for any given number of experts and horizon; in certain regimes (which we identify) they improve upon the previous state of the art. For two and three experts, our bounds provide the optimal leading order term.

New Potential-Based Bounds for Prediction with Expert Advice]New Potential-Based Bounds for Prediction with Expert Advice

[ Vladimir A. Kobzar [email protected]
Center for Data Science, New York University, 60 Fifth Ave., New York, New York
Robert V. Kohn [email protected]and Zhilei Wang [email protected]
Courant Institute of Mathematical Sciences, New York University, 251 Mercer St., New York, New York

1 Introduction

The classic machine learning problem of online prediction with expert advice (the expert problem) is a repeated two-person zero-sum game with the following structure. At each round, the predictor (player) uses guidance from a collection of experts with the goal of minimizing the difference (regret) between the player’s loss and that of the best performing expert in hindsight. The environment (adversary) determines the losses of each expert for that round. The player’s selection of the experts and the adversary’s choice of the loss for each expert are revealed to both parties, and this prediction process is repeated until the final round.

We will focus on the following representative definition of this problem, which mirrors (up to translation and rescaling of the loss) the version considered in recent work on optimal strategies (gravin16; abbasi). {tcolorbox} Prediction with expert advice: At each period t[T], (a) the player determines which of the N experts to follow by selecting a discrete probability distribution ptΔN; (b) the adversary allocates losses to the experts by selecting a probability distribution at over the hypercube [-1,1]N; and (c) the expert losses qt[-1,1]N and the player’s choice of the expert It[N] are sampled from at and pt, respectively, and revealed to both parties. In this setting a=(at)t[T] and p=(pt)t[T] refer to, respectively, the adversary and player strategies or simply the adversary and player. The player strategy may be known to the adversary, and vice versa. In general, each strategy at time t can depend on the history of losses and choices of the expert in previous periods. However, the flow of information above implies that, conditioned on the history, qt and It are independent. We consider the finite horizon version of the problem, where the number of periods T is fixed and the regret is RT(p,a)=𝔼p,a[t[T]qIt,t-minit[T]qi,t].

Numerous strategies attain vanishing per round regret. For example, the exponentially weighted forecaster pe provides the upper bound maxaRT(pe,a)2TlogN. Also for all ϵ>0, there exist N and T sufficiently large, such that the randomized adversary ar (which assigns 1 or -1 to each component of q independently with equal probability) approaches that bound: (1-ϵ)2TlogNminpRT(p,ar).11 1 See cesa-bianchi97 and Theorems 2.2 and 3.7 in cesa-bianchi06. These results are rescaled here to apply to [-1,1]N, instead of [0,1]N, losses.

A minmax optimal player (optimal player) is a player that minimizes the regret over all possible adversaries and a minmax optimal adversary (optimal adversary) is an adversary that maximizes the regret over all possible players. Thus, pe and ar are optimal asymptotically in T and N.

Nonasymptotic optimal strategies have been determined explicitly using random walk methods for N=2, and, up to the leading order term, for N=3 (cover; gravin16; abbasi). For general N, optimal strategies can be found using dynamic programming and depend only on the cumulative losses of each expert and the remaining time, rather than the full history of adversary’s and/or player’s choices (cesa-bianchi97). luo determined optimal strategies in the version of the problem where the adversary’s choice of losses is restricted to the set of standard basis vectors. However, optimal strategies for the original game have not been determined explicitly.

In a related line of work, strategies that are optimal asymptotically in T have been determined by PDE-based methods. For N=2, Zhu established that the value function is given by the solution of a 1D linear heat equation, which provides a continuous perspective on the random walk characterization of the non-asymptotic problem. drenska2019prediction showed that for any N, the value function, in a scaling limit, is the unique solution of an associated nonlinear PDE. bayraktar2019b found closed-form solutions of the PDEs for N=3 and 4.

Due to the complexity of determining optimal strategies for an arbitrary N, it is common to use potential functions to bound the regret above. For example, pe uses the logarithm of the sum of the exponentials of regret with respect to each expert as the potential; the corresponding upper bound is obtained by bounding the evolution of this potential for all possible adversaries. chaudhuri and luo_ada proposed other potential-based player algorithms for variations of the expert problem with different notions of regret and/or additional structure.

rakhlin2012 proposed a principled way of deriving potential-based player strategies by bounding above the value function in a manner that is consistent with its recursive minmax form. rokhlin suggested using supersolutions of the asymptotic PDE as potentials for player strategies leading to upper bounds. The present paper extends these ideas by applying related arguments to broad classes of potentials, and by providing lower as well as upper bounds.

Adversary strategies have been commonly studied as random processes. For example, for any N, ar guarantees that the leading order regret is bounded below by the expectation of the maximum of N i.i.d. Gaussians.22 2 See Theorem 3.7 in cesa-bianchi06. This guarantee is based on the central limit theorem and is therefore asymptotic in T. Nonasymptotic lower bounds have been established using random walk methods. (orabona; gyorgy).

The player’s and the adversary’s selection of strategies is fundamentally a problem of optimal control. Adopting such a viewpoint, in this paper we propose a control-based framework for designing strategies for the expert problem using sub- and supersolutions of certain PDEs. Our principal conceptual advances are the following.

  1. 1.

    The potential-based framework is extended to adversary strategies, leading to lower bounds (Section 3).

  2. 2.

    The task of finding better regret bounds reduces to the mathematical problem of finding better sub- and supersolutions of certain PDEs (See Equations (2b) and (4b)).

  3. 3.

    Our bounds hold for any given number of experts and are nonasymptotic in T; their rate of convergence to the asymptotic (in T) value is determined explicitly using error estimates similar to those used in finite difference schemes in numerical analysis. (Theorems 1 and 3).

These conceptual advances not only provide a fresh perspective on the expert problem, but also lead in some cases to improved bounds. Specifically, we apply our framework to two classes of potentials. The first class is discussed in Section 5, where we use classical solutions of the linear heat equation with suitable diffusion factors as lower and upper bound potentials. The leading order term of the resulting lower bound is the expectation of the maximum of N i.i.d. Gaussians with mean zero, and is therefore similar to the existing lower bound given by ar. However, the constant factor of the leading order term (i.e., the variance of the Gaussians) is state-of-the-art. Additionally, we improve the bounds on the higher order (error) terms (Section 7.2).

A second class of potentials is discussed in Section 6. They are closed-form solutions of a nonlinear PDE where the spatial operator involves the largest diagonal entry of the Hessian. For up to three experts, the lower and upper bounds obtained using this potential match to leading order as the number of time steps approaches infinity. Therefore, the corresponding strategies are optimal to leading order. The same leading order result for three experts was determined in abbasi; our approach, however, provides a smaller error term. Also, for small N and relatively large number of time steps our upper bound is tighter than the one obtained using pe (Section 7.2).

2 Notation

The “spatial variables” and “spatial derivatives” of a function u(x,t) are xN and the derivatives of u with respect to x. For a multi-index I, I refers to the partial derivative and dxI refers to the differential with respect to the spatial variable(s) in I, and dx^I refers to the differential with respect to all except the spatial variables in I. D2u, D3u and D4u refer to the Hessian, 3rd derivative, and 4th derivative of u with respect to x (which are 2nd order, 3rd order and 4th order tensors respectively); the associated multilinear forms D2uq,q, D3u[q,q,q], D4u[q,q,q,q] are i,jijuqiqj, i,j,kijkuqiqjqk and i,j,k,lijkluqiqjqkql.

Prediction with expert advice is a repeated two-person game. It is convenient to denote the time t by nonpositive numbers such that the starting time is T-1 and the final time is zero. The vector rτ=qIτ,τ𝟙-qτ denotes the player’s losses realized in round τ relative to those of each expert (instantaneous regret) and the vector xt=τ<trτ denotes the player’s cumulative losses realized before the outcome of round t relative to those of each expert (cumulative regret or simply the regret).

If u is a function of space and time, subscripts x or t denote partial derivatives (so ux and ut are first derivatives and uxx, uxt and utt are second derivatives). In other settings, the subscript t is an index; in particular, our adversary and player strategies at time t are at and pt and the expert losses and player’s choice at time t are qt and It. When no confusion will result, we sometimes omit the index t, writing for example q rather than qt; in such a setting, qi refers to the ith component of qt.

If u is a function, Δu=i2uxi2 is its Laplacian; however, the standalone symbol ΔN refers to the set of probability distributions on {1,,N}. [T] denotes the set {1,,T} if T1 or {T,,-1} if T-1. 𝟙 is a vector in N with all components equal to 1, but 𝟙S refers to the indicator function of the set S.

A classical solution of a partial differential equation (PDE) on a specified region is a solution such that all derivatives appearing in the statement of the PDE exist and are continuous on the specified region.

3 Lower Bounds

Our lower bounds are associated with well-chosen strategies for the adversary. We shall consider adversary strategies that are Markovian, in the sense that the strategy at time t can depend only on the cumulative regret x and time t. For a given adversary a, it is natural to consider the associated value function va(x,t), defined as the final-time regret achieved by the adversary (assuming the player behaves optimally) if the prediction game starts at time t with cumulative regret vector x. It is characterized by a dynamic program (DP):33 3 Our use of dynamic programming is related to the arguments used in Section 3 in cesa-bianchi97 to show that the optimal strategies are Markovian. Our use is different, however, (and simpler) since we assume from the start that the adversary’s strategy is Markovian.

va(x,0) =maxixiandva(x,t)=minp𝔼at,pva(x+r,t+1)fort-1 (1)

Working backward in time, the DP determines the player’s optimal strategy at each time. It is clearly Markovian, in the sense that this strategy depends only on the time t and the cumulative regret x at that time.

In the context of lower bounds, we shall consider only adversaries that assign the same expectation of each component of q: 𝔼atq=ct𝟙 for some ct[-1,1] and all t<-1 (balanced adversaries). To bound va below, we introduce the following class of potential functions, or simply potentials. As described more fully in Section 7.1, such a potential bounds below the minimax optimal (asymptotically in T) value because the potential is a subsolution of the nonlinear PDE (13) obtained in drenska2019prediction. {tcolorbox} A lower bound potential is a function u:N×0 such that for each xN and t<0, there is a balanced strategy at on [-1,1]N ensuring that u is a classical solution of

ut+12𝔼atD2uq,q0 (2a)
u(x,0)maxixiandu(x+c𝟙,t)=u(x,t)+c (2b)

At t<-1, an adversary associated with u is a balanced strategy at such that (2a) is satisfied at (x,t+1). At t=-1, any distribution a-1 over [-1,1]N may be used. We prove in Appendix A, using induction backward in time, that this potential bound below the adversary’s optimal value va, modulo an “error” term E(t) which can be estimated explicitly. This provides a lower bound on regret since va(0,T)=minpRT(a,p). Note that while the definition of va involves an optimization over the player strategy p, the definition of u does not. Examination of the proof (in Appendix A) reveals that our lower bound is insensitive to p because the adversary strategy a is balanced.

Theorem 1 (Lower bound)

Let u be a lower bound potential and let va be the value function of the associated adversary a. Then, u(x,t)-E(t)va(x,t) where the error term E(t)=C+τ=t-2K(τ) is computed using: (i) a bound on the decrease of u at the last period, which is a constant C satisfying u(x,-1)-minp𝔼a-1,pu(x+r,0)C for all x, and (ii) an error estimate K of the Taylor approximation of u in the earlier periods. If ut(x,) and D2u(,τ+1) are Lipschitz continuous, then any function K satisfying 12ess supτ¯[τ,τ+1]utt(x,τ¯)+16ess supy[x,x-q]D3u(y,τ+1)[q,q,q]K(τ) for all τ[t,-2], all q in the support of aτ and all x, may be used to compute E(t).

If the adversary assigns the same probability to q and -q to each q in its support (a symmetric adversary) and the potential is smooth enough, there is an alternative estimate for the error term, proved in Appendix B, which in some examples gives a better result.

Proposition 2 (Symmetric adversary and smooth potential)

If the adversary a associated with u is symmetric, and D3u(,t+1) exists and is Lipschitz continuous, then in Theorem 1 any function K satisfying 12ess supτ¯[τ,τ+1]utt(x,τ¯)-124ess infy[x,x-q]D4u(y,τ+1)[q,q,q,q]K(τ) for all τ[t,-2], all q in the support of aτ and all x, may be used to compute E(t).

In what follows, we will apply our framework to obtain a fresh perspective on the best existing lower bounds and we will obtain improved lower bounds. Specifically, in Example 2, using the heat potential φ given by (6) with the diffusion factor κ=12, we recover the well-known asymptotic lower bound associated with the randomized adversary ar. We also show that the so-called comb adversary ac does at least as well as ar at leading order in the limit as |T|. By applying Proposition 2, we obtain explicit nonasymptotic bounds for both adversaries. In Example 3, we introduce a new heat adversary ah, associated with the heat potential with a higher diffusion factor κh>12, which improves upon the lower bound associated with ar and ac. For N=2, ah is asymptotically optimal.44 4 For N=2, ac is the same as ah.

Section 6 applies our framework to an adversary associated with the new max potential ψ given by (11). This adversary is asymptotically optimal for N=2 and 3.55 5 For N=2, the adversary associated the max potential is identical to ac and ah.

4 Upper Bounds

Our upper bounds are associated with strategies for the player given by the gradient of specific potentials. We shall only consider potentials that can depend, at time t, only on the cumulative regret x and time t. Consequently, our player strategies are Markovian. In parallel to the discussion above, for a given player p, we consider the value function vp(x,t) defined as the final-time regret achieved by this player (assuming the adversary behaves optimally) if the prediction game begins at time t with cumulative regret vector x. It is characterized by the following DP:

vp(x,0)=maxixiandvp(x,t)=maxa𝔼a,ptvp(x+r,t+1)fort-1 (3)

Working backward in time, this DP determines the adversary’s optimal strategy at each time, and this strategy is also Markovian.

To bound vp above, we introduce the following class of potentials. As described more fully in Section 7.1, such a potential bounds above the minimax optimal (asymptotically in T) value because the potential is a supersolution of the PDE (13).

{tcolorbox}

An upper-bound potential is a function w:N×0, which is nondecreasing as a function of each xi, and which is, for all xN and t<0 is a classical solution of

wt+12maxq[-1,1]ND2wq,q0 (4a)
w(x,0)maxixiandw(x+c𝟙,t)=w(x,t)+c (4b)

The player strategy p associated with w is: At t<-1, the player selects pt=w(x,t+1), and at t=-1, the player selects an arbitrary distribution p-1ΔN. At t<-1, since w is nondecreasing in each xi, pi,t0. Also iiw=1 by linearity of w along 𝟙, which implies that ipi,t=1. Therefore, at t<-1, ptΔN as well.

The following Theorem is proved in Appendix C using induction backward in time. It shows that an upper bound potential w bounds above for the value function vp, modulo an “error” term E(t). This provides an upper bound on the regret since maxaRT(a,p)=vp(0,T). The argument (which is parallel to that for Theorem 1) uses Taylor expansion to estimate how w changes as regret accumulates. The player strategy ensures that the first-order term of the Taylor expansion vanishes regardless of the adversary strategy at.

Theorem 3 (Upper bound)

Let w be an upper bound potential and let vp be the value function of the associated player p. Then, vp(x,t)w(x,t)+E(t) where the error term E(t)=C+τ=t-2K(τ) is computed using: (i) the bound on the increase of w at the last period, which is a constant C satisfying maxa𝔼a,p-1w(x+r,0)-w(x,-1)C for all x, and (ii) an error estimate K of the Taylor approximation of w in the earlier periods. If wt(x,) and D2w(,τ+1) are Lipschitz continuous, then any function K satisfying -12ess infτ¯[τ,τ+1]wtt(x,τ¯)-16ess infy[x,x-q]D3w(y,τ+1)[q,q,q]K(τ) for all τ[t,-2], all q[-1,1]N and all x may be used to compute E(t).

If an upper bound potential has the form

w(x,t)=Φ(x)+ct (5)

for a constant c, the player w(x) does not depend on time. Therefore, we can let the player strategy to be w(x) at t=-1, instead of an arbitrary distribution. The following Proposition, proved in Appendix D, is similar to Theorem 1 in rokhlin, and in this setting, the error term does not appear.

Proposition 4 (Certain potentials)

If, in the setting of Theorem 3, w has the form (5), and the player strategy is w(x) in all periods, then vp(x,t)w(x,t).

As an example, we recover the classic upper bound for the exponentially weighted forecaster pe. Let the potential we be given by we(x,t)=Φ(x)-12ηt where Φ(x)=1ηlog(k[N]eηxk). In Appendix E, we show that maxq[-1,1]ND2Φq,qη. Also we(x,0)maxixi, and Φ(x+c𝟙)=Φ(x)+c, which imply the same results for we. Therefore, we satisfies (4b) and Proposition 4 provides the following result.

Example 1 (Exponential weights)

For the value function vpe of pe, the following upper bound holds vpe(x,t)we(x,t). Taking η=2logN|T| leads to the regret bound: maxaR(a,pe)we(0,T)=2|T|logN.66 6 This example provides the best known upper bound for pe and therefore gives a PDE perspective on Theorem 2.2 of cesa-bianchi06 (rescaled here to reflect [-1,1]N losses).

5 Heat Potentials

In this section, we consider the heat potential φ given by

φ(x,t) =αe-y22σ2maxk(xk-yk)𝑑y (6)

where α=(2πσ2)-N2 and σ2=-2κt. The linearity of the max function in the direction of 𝟙 implies that φ(x+c𝟙,t)=φ(x,t)+c. This potential is the classical solution, on N×<0, of the following linear heat equation

{φt+κΔφ=0φ(x,0)=maxix

Therefore, φ satisfies (2b) and (4b). Let G denote a N-dimensional Gaussian vector with mean 0 and identity covariance. By the definition of the heat potential,

φ(0,T)=-2κT𝔼GmaxGi (7)

Let El.b.φ denote the error term within the meaning of Theorem 1 for the lower bound potential φ with any κ[12,1] and any adversary supported on {±1}N. Appendix F.2 shows that since φ is smooth, by Proposition 2 this term is O(NN) uniformly in t. Theorem 1 is also available and provides El.b.φ(t)=O(NlogN+Nlog|t|). Therefore, El.b.φ(t)=O(NNNlogN+Nlog|t|).

Let Eu.b.φ denote the error term within the meaning of Theorem 3 for upper bound potential φ with κ=1. Appendix F.3 shows that Eu.b.φ(t)=O(NlogN+Nlog|t|).77 7 While the asymptotic notation is used here for conciseness, the Appendices provide explicit error bounds.

We consider the classic randomized adversary ar defined in Section 1. Since it is symmetric, the mixed terms ijφqiqj have zero expectation, and consequently 𝔼arD2φq,q=Δφ. Therefore, a lower bound potential ur=φ with κ=12 also satisfies (2a), and we recover the classic asymptotic lower bound for ar with a new nonasymptotic error term in Example 2. Moreover, since both inequalities in (2b) are satisfied with equalities, the proof of Theorem 1 shows that the difference between var and ur is entirely attributable to the error term El.b.φ. Therefore, ur has the same leading order term as var, i.e., limT-1|T|(ur(x,T)-var(x,T))=0.

We can use the same potential ur to analyze the so-called comb adversary ac, which is defined via ranked coordinates {(i)}i[N] such that x(1)x(2)x(N). {tcolorbox} At each t, the comb adversary ac assigns probability 12 to each of qc and -qc where q(i)c=1 if i is odd and q(i)c=-1 if i is even. In Appendix G, we show that D2φqc,qcΔφr. Therefore, ur combined with the adversary ac also satisfies (2b). gravin16 conjectured that ac might be optimal asymptotically in T for any fixed N and abbasi and bayraktar2019b showed that to be the case for N=3 and 4, respectively. We do not resolve this conjecture for general N, and since (2a) is not satisfied with an equality, our analysis does not guarantee that ur has the same leading order term as vac. However, our result shows that the ac is at least as powerful as ar. The following example summarizes this result and the previous one.

Example 2 (Randomized and comb adversaries)

Let ur be the heat potential φ with κ=12. Then, the value function var of ar satisfies the following lower bound: ur(x,t)-El.b.φ(t)var(x,t). Also ur has the same leading order term in t as var. By equation (7), this bound leads to the regret bound |T|𝔼GmaxGi-El.b.φ(T)minpRT(ar,p).

The same lower bound holds for the value function vac of ac (without a guarantee that ur matches vac at the leading order).

Since limN12logN𝔼maxiGi=1,88 8 See, e.g, Lemmas A.12 in cesa-bianchi06. we have limNlimT-ur(x,T)-El.b.φ(T)2|T|logN=1. Thus, in the limit where T- first, and then N, the value function vac of the comb adversary ac matches the upper bound given by the exponential weights player pe. Therefore, this adversary is doubly asymptotically optimal (previously this was only known for ar).

Next, we introduce a new adversary ah (heat adversary). {tcolorbox} At each t, the heat adversary ah samples qt uniformly from the following set S:

S={q{±1}Ni[N]qi=±1}ifNis oddor{q{±1}Ni[N]qi=0}ifNis even.

This adversary is symmetric because it is the uniform distribution over the symmetric set S. In Appendix H, we show that κhΔφ=12𝔼ahD2φq,q for

κh=1ifN=2,12+12NifN is odd,or12+12N-2otherwise. (8)

The potential uh given by φ with the diffusion factor κ=κh, combined with the adversary ah, satisfies (2b). Also both inequalities in (2b) are satisfied with equalities, and therefore, uh has the same leading order term in t as vah. The resulting lower bound is described in Example 3.

Similar ideas are used to give an upper bound. In Appendix I, we show that 12maxq[-1,1]ND2φq,qΔφ. Also in Appendix F.1, we prove iφ0 for all i[N]. Thus, wh given by φ with κ=1 satisfies (4b) and is associated with the following strategy. {tcolorbox} At each t<-1, the heat player ph selects pth=wh(x,t+1) and, at t=-1, the player selects an arbitrary distribution in ΔN.

Example 3 (New heat-based strategies)

The value function vah of ah satisfies the lower bound uh(x,t)-El.b.φ(t)vah(x,t), and the value function vph of ph satisfies the upper bound vph(x,t)wh(x,t)+Eu.b.φ(t), where uh and wh are the potentials given above. Also uh has the same leading order term in t as vah. Using equation (7), these bounds lead to the regret bounds 2κh|T|𝔼GmaxGi-El.b.φ(T)minpRT(ah,p) and maxaRT(a,ph)2|T|𝔼GmaxGi+Eu.b.φ(T).

For two experts, the lower and upper bounds in the Example above have a matching leading order term 2π|T|. Therefore, the corresponding strategies are minmax optimal asymptotically in T.

6 Max Potentials

In this section, we consider the max potential ψ given by the solution of:

{ψt+κmaxii2ψ=0ψ(x,0)=maxixi (9)

abbasi, using random walk methods, showed that an adversary am associated with ψ (the max adversary) is asymptotically in T optimal for N=3. {tcolorbox} At each t, the max adversary am assigns equal probability to qm and -qm where the entry of qm corresponding to the largest component of x is 1 and the remaining entries are -1.

There is an explicit formula for ψ. Its building blocks are functions of the form g(x,t)=-2κtf(x-2κt) where

f(z)=2πe-z22+zerf(z2) anderf(y)=2π0ye-s2𝑑s. (10)

As shown in Appendix J, f solves f(z)=f′′(z)+zf(z) with lim|z|f(z)|z|=1. Therefore, g(x,t) solves the 1D linear heat equation on ×<0: gt+κgxx=0 with g(x,0)=|x|. We define ψ globally in a uniform manner using ranked coordinates given in Section 5, and verify the following Claim in Appendix J.

Claim 5

Equation (9) has an explicit classical solution on N×<0, namely

ψ(x,t) =1Nix(i)+-2κtl=1N-1clf(zl) (11)

where zl=1-2κt((n=1lx(n))-lx(l+1)), f is given by (10) and cl=1l(l+1).

Since zl does not change when a multiple of 𝟙 is added to x, we have ψ(x+c𝟙,t)=ψ(x,t)+c. Therefore, ψ satisfies (2b) and (4b).

Appendix K shows that D2ψqm,qm=4maxjjjψ. Therefore, um given by ψ with κ=2 satisfies (2a) for the adversary am. Also both inequalities in (2b) are satisfied with equalities. Therefore, similarly to the discussion of ur and uh in Section 5, um has the same leading order term as vam. The resulting lower bound is given in Example 4.

To determine an upper bound, in Appendix L, we prove that 12maxq[-1,1]ND2ψq,qκmmaxii2ψ for

κm=N22(N-1)ifNis evenorN+12ifNis odd (12)

Also in Appendix J.1 we show iψ0 for all i in [N]. Therefore, an upper bound potential wm given by ψ with κ=κm satisfies (4b) and is associated with the following strategy (max player). {tcolorbox} The max player pm selects ptm=wm(x,t+1) at t<-1 and an arbitrary p-1ΔN at t=-1.

Since the formula (11) for ψ uses ranked coordinates, particular scrutiny is needed on the boundaries where the ranking changes. The calculation in Appendix J.2 reveals that the third-order spatial derivatives do not exist on those boundaries. Therefore, Proposition 2 is not available in this setting.

Let El.b.ψ denote the error term within the meaning of Theorem 1 for ψ with κ=2 and the associated adversary am. Appendix M.1 shows that El.b.ψ(t)=O(Nlog|t|). Let Eu.b.ψ denote the “error” term within the meaning of Theorem 3 for ψ with κ=κm. Appendix M.2 shows that Eu.b.ψ(t)=O(Nlog|t|) as well.99 9 While the asymptotic notation is used here for conciseness as well, the Appendix provides explicit error bounds.

Example 4 (Max-based strategies)

The value function vam of am satisfies the lower bound um(x,t)-El.b.ψ(t)vam(x,t) and the value function vpm of pm satisfies the upper bound vpm(x,t)wm(x,t)+Eu.b.ψ(t), where um and wm are the potentials defined above. Also um has the same leading order term in t as vam. Since ψ(0,T)=2(N-1)Nκπ|T|, the regret satisfies the bounds 2(N-1)N2π|T|-El.b.ψ(T)minpRT(am,p) and maxaRT(a,pm)2(N-1)Nκmπ|T|+Eu.b.ψ(T).

The lower and upper bounds have the matching leading order term of 2π|T| and 429π|T| for, respectively, two and three experts. Therefore, the corresponding strategies are minmax optimal asymptotically in T. The same leading order constant for three experts was determined in abbasi (after rescaling for our [-1,1]N loss function) with an O(log2|T|) error term. Our method, however, reduces the error to O(log|T|).

7 Related Work

In this Section, we first describe the relationship of our potentials to the PDE characterizing minimax optimal value. Second, we compare our bounds with the previously known ones.

7.1 PDE Characterizing Minimax Optimal Value

The fact that our bounds for N=2,3 match asymptotically can be understood from a PDE perspective. Indeed, our upper and lower-bound heat and max potentials for N=2 are the same. Our upper and lower-bound max potentials for N=3 are the same as well. They all solve the PDE derived as in drenska2019prediction that, as noted earlier, characterizes the asymptotically optimal result. This observation can also be found in bayraktar2019b (for N=4, however, the solution of the relevant PDE is different from our potentials).

drenska2019prediction showed that, for any fixed N, the leading order term of the minimax value function is the unique viscosity solution of the associated nonlinear PDE. Although the {0,1}N adversary in that reference is different from our [-1,1]N adversary, this is not consequential. Thus, the relevant PDE, as adjusted for our adversary, is the following:

{vt+12maxq[-1,1]ND2vq,q=0v(x,0)=maxixi (13)

Since for an arbitrary N the solution v is not known explicitly, the PDE (13) does not provide a numerical estimate of the regret; moreover it only describes the leading order behavior as |T|. Our framework, by contrast, gives explicit upper and lower bounds, which hold for any T.

While our framework does not use the PDE (13), it is not unrelated. Indeed, since a lower bound potential u must satisfy (2b), it has maxq[-1,1]ND2uq,q𝔼qatD2uq,q. Therefore, u is a so-called subsolution of (13). Since these PDEs have a comparison principle, uv. Similarly, an upper bound potential w given by a solution of (4b) is a supersolution of (13), which implies vw.

While the preceding remarks provide insight about why our potentials work, they rely upon the comparison principle for viscosity solutions of (13) – a result which is by no means elementary. Our arguments (which build on the insight of rokhlin) are, by contrast, entirely elementary, using little more than Taylor expansion. (Our overall framework, presented in Appendices A and C, resembles a “verification argument” from optimal control theory.)

7.2 Relationship to Existing Bounds

Figure 1: Plots of CN with N where CN=(u(0,T)-E(T))/|T| for a lower bound (l.b.) potential u and the associated adversary a (the resulting l.b. is CN|T|minpRT(a,p)) and CN equal to (w(0,T)+E(T))/|T| for an upper bound (u.b.) potential w and the corresponding player p (the resulting u.b. is maxaRT(a,p)CN|T|). Each CN is determined for T=-107 except where it is specified to be asymptotic in T. Plot \subreffig:l compares the improved l.b. with previously known l.b. and u.b’s and plot \subreffig:2 shows the improved u.b. for small N (the exponential weights u.b. various l.b.’s and the asymptotically optimal C4 are also plotted for reference).

Note that κh is strictly larger than 12 for any given N. Therefore, asymptotically in T, the lower bound attained by our heat adversary ah is tighter than the one attained by the classic randomized adversary ar.

When N and T are fixed, a bound obtained using ar is provided by in orabona; their argument involves lower bounding the maximum of N independent symmetric random walks of length |T|. Another lower bound is given in Chapter 7 of gyorgy for an adversary as constructed from a single random walk of length |T|. This as provides a tighter lower bound than our ah when |T| is relatively small. However, as illustrated by Figure 1\subreffig:l, when |T| is large, our strategy ah improves on the lower bound obtained using as. (The lower bound given by orabona is not shown because its value is negative for the given T and range of N.)

Turning to the upper bounds: when N is small and |T| is large, as illustrated by Figure 1\subreffig:2, the max player pm improves on the upper bound given by the exponential weights pe. (The heat player ph also improves on pe in this setting.) See Appendix N for details regarding the numerical computation of these bounds.

8 Conclusions

We establish that potentials can be used to design effective strategies leading to lower bounds as well as upper bounds. We also provide a scheme by which solutions of well-chosen PDEs can be used as upper bound or lower bound potentials. The resulting bounds improve in some cases upon the previously known bounds.

While this paper focuses on the fixed horizon version of the expert problem, kobzar_geom extends our framework to the geometric stopping version, where the final time is not fixed but is rather random, chosen from the geometric distribution.

Acknowledgments

V.A.K and R.V.K. are supported, in part, by NSF grant DMS-1311833. V.A.K. is also supported by the Moore-Sloan Data Science Environment at New York University.

References

A Proof of Theorem 1

Since va is characterized by the dynamic program (1), we show that u(x,t)-E(t)va(x,t) by induction starting from the final time. The initial step follows from the inequality between va and u at t=0. To prove the inductive step, as a preliminary result, we bound below minp𝔼at,p[u(x+r,t+1)]-u(x,t) in terms of C and K(t). At t=-1, the conditions of the theorem already provide:

minp𝔼a-1,p[u(x+r,0)]-u(x,-1)-C

For t-2, we note that r=qI𝟙-q and use the linearity of u in the direction of 𝟙:

minp𝔼p,at[u(x+r,t+1)]-u(x,t)
=minp𝔼p,at[u(x-q,t+1)+qI]-u(x,t+1)+u(x,t+1)-u(x,t)

Since u(,t+1) is C2 with Lipschitz continuous second-order derivatives in x, we use Taylor’s theorem with the integral remainder

u(x-q,t+1)= u(x,t+1)-u(x,t+1)q+12D2u(x,t+1)q,q
-01D3u(x-μq,t+1)[q,q,q](1-μ)22𝑑μ (14)

Thus,

u(x-q,t+1)-u(x,t+1)+qI qI-u(x,t+1)q+12D2u(x,t+1)q,q
-16ess supy[x,x-q]D3u(y,t+1)[q,q,q] (15)

Similarly, u(x,t+1)-u(x,t)ut(x,t+1)-12ess supt¯[t,t+1]utt(x,t¯).

The rules of the game provide that q distributed according to at and I distributed according to pt are independent conditioned on history. Therefore, 𝔼p,at[qI-u(x,t+1)q]=p-u(x,t+1),𝔼atq=0 for all p since at is balanced and iiu=1 by linearity of u along 𝟙. As a result, we can eliminate the dependence on p. Also we note the condition on the potential (2a).

Using the foregoing results and the definition of K, we obtain

minp𝔼p,at[u(x+r,t+1)]-u(x,t)-K(t)=E(t+1)-E(t) (16)

Finally, using (16), the inductive hypothesis u(x+r,t+1)-E(t+1)va(x+r,t+1), and the dynamic program formulation of va in (1), we obtain

u(x,t)-E(t) u(x,t)+minp𝔼p,atu(x+r,t+1)-u(x,t)-E(t+1)
minp𝔼p,at[va(x+r,t+1)]=va(x,t)

B Proof of Proposition 2

If D3u(,t+1) exists and is Lipschitz continuous, then (A) can be replaced by

u(x-q,t+1)= u(x,t+1)-u(x,t+1)q+12D2u(x,t+1)q,q
-16D3u(x,t+1)[q,q,q]+01D4u(x-μq,t+1)[q,q,q,q](1-μ)36𝑑μ

and in such case (A) is replaced by

u(x-q,t+1)-u(x,t+ 1)+qIqI-u(x,t+1)q+12D2u(x,t+1)q,q
-16D3u(x,t+1)[q,q,q]+124ess supy[x,x-q]D4u(y,t+1)[q,q,q,q]

Since the adversary a is symmetric, q has the same distribution as -q. Therefore, 𝔼atqiqjqk=-𝔼atqiqjqk, for any i, j, and k. This implies 𝔼atqiqjqk=0 and consequently 𝔼atD3u(x,t+1)[q,q,q]=0. The remainder of the proof of Theorem 1 is the same except that we use the definition of K given in this Proposition.

C Proof of Theorem 3

Since vp is characterized by the dynamic program (3), we show by induction that vp(x,t)w(x,t)+E(t). The initial step follows from the inequality between vp and w at t=0, and the rest of the proof is similar to the oroof of Theorem 1. To prove the inductive step, we note that maxa𝔼p-1,a[w(x+r,0)]-w(x,-1)C. For t-2, we again note that r=qI𝟙-q and use the linearity of w in the direction of 𝟙:

maxa𝔼pt,a[w(x+r,t+1)]-w(x,t)
=maxa𝔼at[w(x-q,t+1)+ptq]-w(x,t+1)+w(x,t+1)-w(x,t) (17)

The equality above also uses the fact that under the rules of the game, q distributed according to at and I distributed according to pt are independent, conditionally on history. Since w(,t+1) is C2 with Lipschitz continuous second order derivatives, we again use Taylor’s theorem with the integral remainder

w(x-q,t+1)= w(x,t+1)-w(x,t+1)q+12D2w(x,t+1)q,q
-01D3w(x-μq,t+1)[q,q,q](1-μ)22𝑑μ (18)

The fact that pt=w(x,t+1) provides that ptq-w(x,t+1)q=0 for all q. Thus

w(x-q,t+1)+ptq-w(x,t+1) 12D2w(x,t+1)q,q
-16ess infy[x,x-q]D3w(y,t+1)[q,q,q]

Similarly,

w(x,t+1)-w(x,t)wt(x,t+1)-12ess infτ[t,t+1]wtt(x,τ) (19)

Also we note the following condition on the potential (4a). By collecting the above inequalities and using the definition of K,

maxa𝔼pt,a[w(x+r,t+1)]-w(x,t)K(t)=E(t)-E(t+1) (20)

Using the inequality (20), the inductive hypothesis w(x+r,t+1)+E(t+1)vp(x+r,t+1), and the dynamic program formulation of vp in (3), we obtain

w(x,t)+E(t) w(x,t)+maxa𝔼pt,aw(x+r,t+1)-w(x,t)+E(t+1)
maxa𝔼pt,a[vp(x+r,t+1)]=vp(x,t)

D Proof of Proposition 4

By definition, w is twice differentiable in x for all x and t<0. Then, the form w(x,t)=Φ(x)+ct, implies that w is so differentiable for all t. Therefore, we bound (17) using a Taylor expansion starting at t=-1, rather than t=-2. In this case, it suffices to show that K(t)=0 for all Tt-1. Noting that D2w(x,t)=D2Φ(x), we use Taylor’s theorem with the mean value form of the second-order (in x) remainder. Thus, (18) is replaced by

w(x-q,t+1)=w(x,t+1)-w(x,t+1)q+12D2Φ(y)q,q

for y=x-μq and some μ[0,1]. Since wt is constant, (19) is replaced by w(x,t+1)-w(x,t)=wt=c. Therefore, (20) is replaced by maxa𝔼pt,a[w(x+r,t+1)]-w(x,t)0. The rest of the proof of Theorem 3 is the same; it reveals that w(x,t)vp(x,t) for all Tt-1 and all x, as desired.

E Hessian of the Exponential Weights Potential

By a standard result, Φ is convex.1010 10 See, e.g., Sec. 3.1.5 in boyd_cvx_book. Therefore, D2Φ is a positive semidefinite matrix, and its quadratic form D2Φq,q is maximized at one of the extreme points {±1}N. Note that

ijΦ(x,t)={ψ′′(y)ϕ(xi)ϕ(xj)ifijψ′′(y)ϕ(xi)2+ψ(y)ϕ′′(xi)ifi=j

where

y=k=1Nϕ(xk),ψ(y)=1ηlog(y),ψ(y)=1ηy,ψ′′(y)=-1ηy2,
ϕ(xk)=eηxk,ϕ(xk)=ηeηxkandϕ′′(xk)=η2eηxk

Using these results, for all q{±}N

D2Φq,q =-η(k=1Neηxk)-2i,jeηxieηxjqiqj+η=η-ηpe,q2η

F Heat Potential Error Terms

In this Appendix, we compute the error terms for the heat potential φ given by (6). As a preliminary result, in Appendix F.1, we compute the spatial derivatives of φ up to the 4th order and determine their sign. In Appendix F.2, we determine the lower bound error term El.b.φ for an arbitrary adversary supported on {±1}N. Since φ is smooth, we use Proposition 2 for purposes of the lower bound. Finally, in Appendix F.3, we determine the upper bound error term Eu.b.φ.

F.1 Spatial Derivatives of the Heat Potential

Note that maxk(xk-yk) is differentiable almost everywhere and

imaxk(xk-yk)={1ifxi-yi>maxji(xj-yj)0ifxi-yi<maxji(xj-yj)

Therefore, the first derivatives are

iφ=αe-y22σ2𝟙xi-yi>maxjixj-yj𝑑y=αe-x-y22σ2𝟙yi>maxjiyj𝑑y0

and the second pure derivatives are

iiφ= -ασ2e-x-y22σ2(xi-yi)𝟙yi>maxjiyj𝑑y=-ασ2e-y22σ2yi𝟙xi-yi>maxjixj-yj𝑑y
= -ασ2N-1e-jiyj22σ2-xi-maxjixj-yje-yi22σ2yi𝑑yi𝑑y^i

where y^i is a vector in N-1 containing the same components as yN except yi. Since -xi-maxjixj-yje-yi22σ2yi𝑑yi<0, we have iiφ>0.

The second mixed derivatives are

ijφ= -ασ2e-x-y22σ2(xj-yj)𝟙yi>maxkiyk𝑑y=-ασ2e-y22σ2yj𝟙xi-yi>maxkixk-yk𝑑y
= -ασ2N-1e-kjyk22σ2𝟙xi-yi>maxki,jxk-ykxj-xi+yie-yj22σ2yj𝑑yj𝑑y^j

Since xj-xi+yie-yj22σ2yj𝑑yj>0, we have ijφ<0.

The third derivatives are

iiiφ= -ασ2e-x-y22σ2(1-(xi-yi)2σ2)𝟙yi>maxjiyj𝑑y
= -ασ2e-y22σ2(1-yi2σ2)𝟙xi-yi>maxjixj-yj𝑑y
ijjφ= -ασ2e-x-y22σ2(1-(xj-yj)2σ2)𝟙yi>maxkiyk𝑑y
= -ασ2e-y22σ2(1-yj2σ2)𝟙xi-yi>maxkixk-yk𝑑y

when i,j,k are all distinct (assuming N3),

ijkφ=ασ4e-x-y22σ2(xj-yj)(xk-yk)𝟙yi>maxliyl𝑑y
=ασ4e-y22σ2yjyk𝟙xi-yi>maxlixl-yl𝑑y
=ασ4N-2e-lj,kyl22σ2𝟙xi-yi>maxli,j,kxl-ylxj-xi+yie-yj22σ2yj𝑑yjxk-xi+yie-yk22σ2yk𝑑yk𝑑y^jk

Since xj-xi+yie-yj22σ2yj𝑑yjxk-xi+yie-yk22σ2yk𝑑yk>0, we have ijkφ>0.
where y^jk is a vector in N-2 containing the same components as yN except yi and yj.

The fourth derivatives are

iiiiφ= ασ4Ne-x-y22σ2(xi-yi)(3-(xi-yi)2σ2)𝟙yi>maxjiyj𝑑y
= ασ4Ne-y22σ2yi(3-yi2σ2)𝟙xi-yi>maxjixj-yj𝑑y
iijjφ= ασ4Ne-x-y22σ2(xi-yi)(1-(xj-yj)2σ2)𝟙yi>maxkiyk𝑑y
= ασ4Ne-y22σ2yi(1-yj2σ2)𝟙xi-yi>maxkixk-yk𝑑y
ijiiφ= ασ4Ne-y22σ2yi(3-yi2σ2)𝟙xj-yj>maxmjxk-yk𝑑y
ijjjφ= ασ4Ne-y22σ2yi(1-yj2σ2)𝟙xj-yj>maxmjxk-yk𝑑y
ijkkφ= ασ4Ne-y22σ2yi(1-yk2σ2)𝟙xj-yj>maxmjxk-yk𝑑y

and

ijklφ=-ασ6Ne-x-y22σ2(xj-yj)(xk-yk)(xl-yl)𝟙yi>maxmiym𝑑y
=-ασ6Ne-y22σ2yjykyl𝟙xi-yi>maxmixm-ym𝑑y
=-ασ6N-3e-lj,k,lyl22σ2𝟙xi-yi>maxmi,j,k,lxm-ym(n={j,k,l}xn-xi+yie-yn22σ2yn𝑑yn)𝑑y^jkl

where i, j, k and l are all distinct (i.e., assuming N4) and y^jkl is a vector in N-3 containing the same components as yN except yi,yj and yk. Since xn-xi+yie-yn22σ2yn𝑑yn>0, we have ijklφ<0.

F.2 Lower Bound Error: Heat Potential

To apply Theorem 1 with respect to an adversary supported on {±1}N associated with the heat potential φ, we determine the error term El.b.φ(t)=C+τ=t-2K(τ) where C is a constant satisfying φ(x,-1)-minp𝔼a-1,pφ(x+r,0)C for all x, and K is a function satisfying

12ess supτ¯[τ,τ+1]φtt(x,τ¯)+16ess supy[x,x-q]D3φ(y,τ+1)[q,q,q]K(τ)

for all τ[t,-2], all q in {±1}N and all x.

In the remainder of this Appendix F, let G denote an N-dimensional Gaussian random vector with mean 0 and identity covariance. In Appendix F.2.1, we show that |φ(x,-1)-φ(x+r,0)|C for all x and r where C=2+2κ𝔼maxiGi. The expression 𝔼maxiGi has a closed-form expression for N5. The asymptotically optimal upper bound for this quantity is 2logN (e.g, Lemmas A.12 and A.13 in cesa-bianchi06) and a sharper non-asymptotic upper bound for N7 is provided in dasgupta14. Therefore, C=O(κlogN).

In Appendix F.2.2, we prove that |φtt(x,τ)|K2|τ|32 for all x and τ-1 where K2=κ22𝔼[|N+2-G2|maxi|Gi|]. To bound K2, we use the fact that 𝔼[||G||2]=N, 𝔼[||G||4]=N(N+2)1111 11 𝔼[||G||2m]=0r2mrn-1e-r22𝑑r/0rn-1e-r22𝑑r can be computed explicitly using properties of the Gamma function. and 𝔼maxiGi22logN+2logN+1.1212 12 See, e.g. Example 2.7 in boucheron13. By Cauchy-Schwarz inequality:

𝔼[|N+2-G2|maxi|Gi|] 𝔼[(N+2-G2)2]𝔼[maxiGi2]
2(N+2)(2logN+2logN+1)

Therefore, K2=O(κNlogN).

In Appendix N.0.1, we show that |D3φ[q,q,q](x,t)|1|t|K3 for all q[-1,1]N where K3=1κ(32N+a𝔼maxi|1-Gi2|) and a=1 for N=2 and 2 for N3. To bound K3, note that 𝔼maxi|1-Gi2|𝔼maxiGi2+1, where the right-hand side is bounded as described in the preceding paragraph. Therefore, K3=O(Nκ).

Therefore, K(τ)=12K2|τ+1|32+K361|t+1| and

τ=t-2K(τ) =τ=t-212K2|τ+1|32+K361|t+1|s=1|t|-1K22s32+K36sK22+K36+s=1|t|-1K22s32+K36sds
=K22(3-2|t|-1)+K36(1+log(|t|-1))

The foregoing shows that for κ[12,1], El.b.φ(t)=O(NlogN+Nlog|t|) by Theorem 1.

Since φ is smooth, Proposition 2 is also available: to use it we identify a function K satisfying

12ess supτ¯[τ,τ+1]utt(x,τ¯)-124ess infy[x,x-q]D4u(y,τ+1)[q,q,q,q]K(τ)

for all τ[t,-2], all q{±1}N and all x. In Appendix F.2.4 we show that for q{±1}N, |D4φ(x,t)[q,q,q,q]K4(t)|t|32 where K4=22Nκ32(26+32N+4). Therefore, K(τ)=12K2|τ+1|32+124K4|τ+1|32 and τ=t-2K(τ)(K22+K424)(3-2|t|-1). This shows that for κ[12,1], El.b.φ(t)=O(NN) uniformly in t. Combining this with the result in the preceding paragraph, we obtain El.b.φ(t)=O(NNNlogN+Nlog|t|).

F.2.1 Bound on |φ(x,-1)-φ(x+r,0)|

We decompose the difference as follows

φ(x+r,0)-φ(x,-1)=maxi(x+r)i-maxixi+φ(x,0)-φ(x,-1)

Since r=qI𝟙-q[-2,2]N, we obtain -2maxi(x+r)i-maxixi2. Also since -maxi(x-y)i-maxixi+miniyi,

φ(x,0)-φ(x,-1) =αe-y22σ2maxixi-maxi(x-y)idyαe-y22σ2miniyidy=-σ𝔼maxiGi

where σ=2κ at t=-1. Thus, φ(x+r,0)-φ(x,-1)-2-2κ𝔼maxiGi. Similarly, since -maxi(x-y)i-maxixi+maxyi, we obtain φ(x+r,0)-φ(x,-1)2+2κ𝔼maxiGi.

F.2.2 Bound on |φtt|

For each t<0, it suffices to give a uniform upper bound of |utt(x,t)| over all xN. Since

ttφ=t(-κΔu)=-κΔ(tφ)=κ2Δ2φ

it suffices to bound Δ2φ=i,jiijjφ. By Appendix F.1

i,jiijjφ =iiiiiφ+jiiijjφ
=iασ4Ne-y22σ2yi(N+2-y2σ2)𝟙xi-yi>maxkixk-yk𝑑y
=ασ4Ne-y22σ2(N+2-y2σ2)iyi𝟙xi-yi>maxkixk-ykdy

Combining above with the fact that |iyi𝟙xi-yi>maxkixk-yk|maxi|yi|

|i,jiijjφ| 1σ3𝔼[|N+2-G2|maxi|Gi|]

Therefore, |ttφ(x,t)|1|t|32κ22𝔼[|N+2-G2|maxi|Gi|].

F.2.3 Bound on |D3φ[q,q,q]|

For each t<0, we bound |D3φ(x,t)[q,q,q]| uniformly in xN and q[-1,1]N. First, note that

D3φ[q,q,q]= i(iiiφqi2+3jiijjφqj2)qi+ijiki,jijkφqiqjqk
=i(-2iiiφqi2+3jijjφqj2)qi+ijiki,jijkφqiqjqk

We derive the following identity by linearity of φ along 𝟙:

ijiki,jijkφ= -iji(ijjφ+iijφ)=-2ijiiijφ=2iiiiφ

Using the fact that ijkφ>0 and this identity, for N3,

|D3φ[q,q,q]| 2i|iiiφ|+3i|jijjφqj2|+ijiki,jijkφ
=2i|iiiφ|+3i|jijjφqj2|+2iiiiφ
3i|jijjφqj2|+4i|iiiφ|

and for N=2,

|D3φ[q,q,q]|2i|iiiφ|+3i|jijjφqj2|

Using the formulas for third derivatives,

jijjφqj2=-cNσ2e-y22σ2(jqj2(1-yj2σ2))𝟙xi-yi>maxkixk-yk𝑑y

we obtain

i|jijjφqj2| cNσ2e-y22σ2|jqj2(1-yj2σ2)|𝑑y
= 1σ2𝔼|jqj2(1-Gj2)|

Using Jensen’s inequality and the independence of Gj,

𝔼|jqj2(1-Gj2)|𝔼(jqj2(1-Gj2))2
=Var(jqj2Gj2)=2jqj42N

Also,

i|iiiφ| ασ2Ne-y22σ2i|1-yi2σ2|𝟙xi-yi>maxjixj-yjdy
ασ2Ne-y22σ2maxi|1-yi2σ2|dy
= 1σ2𝔼[maxi|1-Gi2|]

Therefore, for all q[-1,1]N, |D3φ(x,t)[q,q,q]|1|t|C3 where C3=1κ(32N+a𝔼maxi|1-Gi2|) and a=1 for N=2 and a=2 for N3.

F.2.4 Bound of |D4φ[q,q,q,q]| for q{±1}.

For each t<-1, we bound D4φ[q,q,q,q] uniformly for all xN and q{±1}N. For distinct i,j and k by Appendix F.1 we have

ijiiφ+ijjjφ+ki,jijkkφ=ασ4Ne-y22σ2yi(N+2-y2σ2)𝟙xj-yj>maxmjxk-yk𝑑y

Also,

ij|ijiiφ|iασ4Ne-y22σ2|yi(3-yi2σ2)|𝑑yNσ3𝔼[|G(3-G2)|]

Since ijklφ<0 for distinct i,j,k,l (assuming N4) and D4φ[𝟙,𝟙,𝟙,𝟙]=0.

D4φ[q,q,q,q] iiiiiφ+3ijiiijjφ+2iji(ijiiφ+ijjjφ)qiqj
+6ijiki,jijkkφqiqj+ijiki,jli,j,kijklφ
= 2iji(ijiiφ+ijjjφ)(qiqj-1)+6ijiki,jijkkφ(qiqj-1)
= -4iji(iiijφ+ijjjφ)(qiqj-1)+6ijiijΔφ(qiqj-1)
-16ij|iiijφ|-24ij|ijΔφ|
-16Nσ3𝔼[|G(3-G2)|]-24ασ4Ne-y22σ2i|yi(N+2-y2σ2)|dy
= -8σ3(2N𝔼[|G(3-G2)|]+3𝔼[i|Gi(N+2-G2)|])
-8Nσ3(2𝔼[G2]𝔼[(3-G2)2]+3𝔼[G2]𝔼[(N+2-G2)2])
-22N(κ|t+1|)32(26+32N+4)

For N=2,3 the calculation is similar.

F.3 Heat Potential: Upper Bound Error Term

To apply Theorem 3 with respect to the player associated with the heat potential φ, we also need to determine the error term Eu.b.φ(t)=C+τ=t-2K(τ) where C is a constant satisfying maxa𝔼a,p-1φ(x+r,0)-φ(x,-1)C for all x and K is a function K

-12ess infτ¯[τ,τ+1]wtt(x,τ¯)-16ess infy[x,x-q]D3w(y,τ+1)[q,q,q]K(τ)

for all τ[t,-2], all q[-1,1]N and all x.

In Appendix F.2, we showed that |φ(x,-1)-φ(x+r,0)|C for all x and r where C=O(κlogN). Similarly, in that section we proved that |φtt(x,t)|K2|t|32 for all x and t<-1 where K2=O(κNlogN). Finally, we showed that |D3φ[q,q,q](x,t)|1|t|K3 for all q[-1,1]N where K3=O(Nκ). These results are also applicable in the upper bound setting and therefore for κ=1, Eu.b.φ(t)=O(NlogN+Nlog|t|).

G Comb Adversary

In this Appendix, we show that ΔφD2φqc,qc. Appendix G.1 shows that if xixjxl, then ijφilφ0. Using this result, Appendix G.2 shows that i<jijφqicqjc0, which implies the desired result.

G.1 Ordering of Mixed Derivatives of the Heat Potential

Note that xj-xi+yie-yj22σ2yj𝑑yj=σ2e-(xj-xi+yj)22σ2, and

-xi-maxki,j(xk-yk)e-yi22σ2(σ2e-(xj-xi+yj)22σ2)𝑑yi
=σ3π2e-(xi-xj)24σ2[erf[xi-xj+2yi2σ]]yi=-xi-maxki,j(xk-yk)
=σ3π2e-(xi-xj)24σ2erf[xi+xj-2maxki,j(x-y)k2σ+1]

Plugging the above into the expression for ijφ in Appendix F.1 for ij, we obtain

ijφ(x,t) =-cNσ2N-2e-ki,jyk22σ2-xi-maxki,j(xk-yk)e-yi22σ2(xj-xi+yie-yj22σ2yj𝑑yj)𝑑yi𝑑y^i,j
=-cNσπ2e-(xi-xj)24σ2N-2e-k=1N-2zk22σ2erf[xi+xj-2max1kN-2(x^k-zk)2σ+1]𝑑z

where x^ is a vector in N-2 containing the same components as x except for xi and xj.

Let φ=φ(x,t) be evaluated at arbitrary x and t<0 and let {(i)}[N] be the ranked coordinates defined in Section 5 associated with x. Showing that if xixjxl, then ijφilφ0 is equivalent to showing that if ijl, then (i)(j)φ(x,t)(i)(l)φ(x,t)0.

Note that if ijl, then x(i)+x(j)-2maxki,j(x(k)-z(k))xi+xl-2maxki,l(x(k)-z(k)) for all zN-2. Since erf is an increasing function, (i)(j)φ(x,t)(i)(l)φ(x,t)0, as desired.

G.2 Sign of i<jijφqicqj

We show that for qc chosen in accordance with the comb strategy ac, i<jijφ(x,t)qicqjc=i<j(i)(j)φ(x,t)q(i)cq(j)c0 (where the left hand side uses coordinates in an arbitrarily indexed canonical basis and the right-hand side uses ranked coordinates associated with x). If N is even,

i<j(i)(j)φq(i)cq(j)c= i:odd((i<j:even<N-(i)(j)φ+(i)(j+1)φ)-(i)(N)φ)
+i:even(i<j:odd<N-(i)(j)φ+ij+1φ)

Similarly, if N is odd,

i<j(i)(j)φq(i)cq(j)= i:odd(i<j:even<N-(i)(j)φ+(i)(j+1)φ)
+i:even((i<j:odd<N-(i)(j)φ+(i)(j+1)φ)-(i)(N)φ)

Both of these expressions are positive by the ordering of mixed partial derivatives established in Appendix G.1 and the fact that (i)(N)φ<0 for iN as shown in Appendix F.1.

H Lower Bound Heat Potential: Diffusion Factor

Note that φ(x+c𝟙,t)=φ(x,t)+c implies that iiφ=1, iiφ=-ijφ, and therefore D2φ𝟙=0. For N=2, this result and the fact that D2u is symmetric imply that D2u has the form

D2u=[a-a-aa]

It is straightforward to verify that 12𝔼ahD2φq,q=Δφ and therefore κh=1.

When N>2, 12𝔼ahD2φq,q=1|S|qSD2φq,q=D2φ,1|S|qSqqF where the set S is defined in Section 5 and ,F is the Frobenius inner product. Since S is permutation invariant, the off-diagonal entries of 1|S|qSqq are all equal and the diagonal entries are all equal to 1, and therefore, this expression is equal to (1-λ)I+λM for some constant λ where M=𝟙𝟙. Note that

1|S|qSqq,MF={1ifNis odd0ifNis even

which implies that λ=-1N if N is odd and -1N-1 if N is even. Using the fact that D2φ,MF=0, we obtain D2φ,1|S|qSqqF=(1-λ)Δu. This shows that 12𝔼ahD2φq,q=κhΔφ where κh=12(1-λ), as desired.

The foregoing proof is short and elementary. But to put the result in context, the only properties D2φ we used is that it is symmetric and has 𝟙 in the kernel. Therefore, for an arbitrary N×N matrix M with these properties, we showed that

2κhTrace(M)=𝔼ahMq,qmaxq{±1}NMq,q (21)

where the inequality follows from a probabilistic argument.

The Laplacian L(G) of an undirected graph G=(V,E) with |V|=N vertices is given by

L(G)ij={-wijifijkNwikifi=j

where wij0 is the weight of the edge (i,j)E. The sum of the edge weights of G is |E|=i<jwiju=12Trace(L(G)) and the maximum cut of G is maxcut(G)=maxq{±1}Ni<jwij1-qiqj2. Using the convention that wii=0,

maxqL(G)q,q=maxqi,j-wijqiqj+i[jwij]qi2
=maxqi,jwij(1-qiqj)=2maxqi<jwij(1-qiqj)=4maxcut(G)

where the feasible set of q is {±1}N.

For a graph Gu with each wij{0,1} (unweighted graph), it is known that (12+12N)|E|maxcut(Gu) (haglin). Since every Laplacian is symmetric and has 𝟙 in the kernel, the inequality (21) implies κh|E|Max-Cut(G) for a weighted graph. (Although, similarly to a graph Laplacian, the off-diagonal elements of D2φ are negative as shown in Appendix F.1, we did not use this property in our proof.1313 13 Therefore, our result is broader and also holds for matrices with arbitrary signs of off-diagonal elements, such as Laplacians of graphs with signed edge weights.)

I Upper Bound Heat Potential: Diffusion Factor

Appendix F.1 shows that ijφ<0 for ij and iiφ>0. Also the fact iiφ=1, which follows from linearity of φ in the direction of 𝟙, implies that i,jijφ=0. Thus, 12maxq[-1,1]ND2φq,q12Δφ-12ijijφ=Δφ.

J Proof of Claim 5

We prove Claim 5 as follows. In Appendix J.1, we compute the spatial derivatives of max potential ψ defined by (11) up to the third order for every x in the ranked coordinates {(i)}i[N], as defined in Section 5. In Appendix J.2 we prove that when the ranking changes, the second derivatives are continuous, and therefore, ψ is a C2 function of x. The third order spatial derivatives are defined almost everywhere (i.e., everywhere except where the ranking changes) and bounded. Therefore, the second order derivatives of ψ are Lipschitz continuous but ψ is not a C3 function of x. Finally, in Appendix J.3, we use these results to show that ψ satisfies (9).

J.1 Derivatives of the Max Potential

Note that

f(z)=erf(z2) andf′′(z)=2πexp(-z22)

Then for ij

(i)f(zl)={0 if l+1<i-l-2κtf(zl) if l+1=i1-2κtf(zl) if li and(i)(j)f(zl)={1(-2κt)f′′(zl) if jll2(-2κt)f′′(zl) if i=j=l+1l2κtf′′(zl) if i<j=l+10 if j>l+1

Therefore, the first derivatives are

(i)ψ ={1N+l=1N-1clf(zl) if i=11N+l=iN-1clf(zl)-(i-1)ci-1f(zi-1) if i2

Since x(1)x(2)x(N), we have 0z1z2zN-1 and therefore 0f(z1)f(z2)f(zN-1). As a consequence iψ0,i[N].

The second derivatives are

(i)(i)ψ ={1-2κtl=1N-1clf′′(zl) if i=11-2κt(l=iN-1clf′′(zl)+(i-1)2ci-1f′′(zi-1)) if 2iN-11-2κt(N-1)2cN-1f′′(zN-1) if i=N

or for i<j

(i)(j)ψ ={1-2κt(l=jN-1clf′′(zl)-(j-1)cj-1f′′(zj-1)) if j<N-1-2κt(N-1)cN-1f′′(zN-1) if j=N

The third derivatives are

(i)(i)(i)ψ={1(-2κt)l=1N-1clf′′′(zl)ifi=11(-2κt)(l=iN-1clf′′′(zl)-(i-1)3ci-1f′′′(zi-1))if2iN-112κt(N-1)3cN-1f′′′(zN-1)ifi=N

when ij,

(i)(j)(j)ψ={1(-2κt)(l=jN-1clf′′′(zl)+(j-1)2cj-1f′′′(zj-1))ifi<jN-11(-2κt)(N-1)2cN-1f′′′(zN-1)ifi<j=N1(-2κt)(l=iN-1clf′′′(zl)-(i-1)ci-1f′′′(zi-1))ifj<iN-11(2κt)(N-1)cN-1f′′′(zN-1)ifj<i=N

and when i<j<k

(i)(j)(k)ψ={1(-2κt)(l=kN-1clf′′′(zl)-(k-1)ck-1f′′′(zk-1))ifkN-112κt(N-1)cN-1f′′′(zN-1)ifk=N

J.2 ψ is C2 with Lipschitz Continuous Second Order Spatial Derivatives

First, we show that the function ψ defined by (11) is C2 in the spatial variables x1,,xN. Since (11) uses ranked coordinates, we can view ψ as being defined in the sector {x1x2xN} then extended by symmetry to all N.

The heart of the matter is the observation that at each plane xk=xk+1 the normal derivative of ψ is zero. Indeed, when x1x2xN the formula (11) involves two sums, i=1Nxi and l=1N-1clf(zl). The former certainly has normal derivative equal to zero at each of the sector’s faces xk=xk+1, so we may concentrate on the latter. Since z1=x1-x2 while z2,,zN-1 are symmetric in x1 and x2, at the face x1=x2 (equivalently, z1=0) the normal derivative is a multiple of f(0), which vanishes since f(z) is an even function of z (see (10)). Turning to the face xk=xk+1 with k2, we observe that z1,,zk-2 do not involve xk or xk+1 while zk+1,,zN-1 are symmetric in xk and xk+1; moreover xk=xk+1 is equivalent to zk-1=zk. Therefore the normal derivative of ψ is a multiple of

ckf(zk)-(k-1)ck-1f(zk)+kckf(zk)=0

using the fact that ck=1k(k+1).

To explain why this observation implies the C2 continuity of ψ, it suffices to consider the restriction of ψ to {x1++xN=0} (since ψ(x1+c,,xN+c,t)=ψ(x,t)+c). Changing variables to yk=xk-xk+1 (1kN-1), the C2 character of ψ follows from the following calculus lemma applied to g(y1,yN-1)=ψ(x1,,xN,t) for any fixed t.

Lemma 6

For any m1, let g(y1,,ym) be C2 on the positive quadrant {yi0for each i}, and assume that ig=0 at the face yi=0. Then the symmetric extension of g,

g(y1,,ym)=g(|y1|,,|ym|),

is C2 on all m.

Proof  The case m=1 is familiar: for y1<0 we have g(y1)=-g(-y1) and g′′(y1)=g′′(-y1). If g(0)=0 then g and its first and second derivatives match at y1=0, and it follows that g is C2.

The case m=2 similar. At the face y1=0 of the positive quadrant we have 1g(0,y2)=0 by hypothesis, and therefore 12g(0,y2)=0 by differentiation with respect to y2; similarly, 12g=0 at the face y2=0. It follows that the first and second derivatives of the extension of g are all continuous across the planes y1=0 and y2=0. So g is C2.

The general case is essentially the same. To see that ig and iig are continuous it suffices to apply the argument used for m=1 along the line obtained by holding all variables except yi constant. To see that ijg is continuous for ij it suffices to apply the argument used for m=2 in the plane obtained by holding all variables except yi and yj constant.  


We next show ψ is not C3. Suppose x1>x2>x3>x4>xN then since ψ(x1,x2,x3,xN)=ψ(x1,x3,x2,xN) we have 222ψ(x1,x2,x3,xN)=333ψ(x1,x3,x2,xN). However

222ψ(x1,x2,x3,xN)-333ψ(x1,x2,x3,xN)=32f′′′(z2)-12f′′′(z1)

which does not approach to 0 when x approaches to {x1>x2=x3>>xN}. This means 333ψ cannot be continuously extended to the boundary {x1>x2=x3>>xN}.

Finally, we show the boundedness of third order derivatives. Note that for z0,

-2eπf′′′(z)=-2πze-z220

From Appendix J.1 we have

{12κt2eπ(1i-1N)(i)(i)(i)ψ1(-2κt)2eπ(i-1)2i12κt2eπ(1-1N)(i)(j)(j)ψ0ifi<j12κt2eπ(1i-1N)(i)(j)(j)ψ1(-2κt)2eπ1iifi>j12κt2eπ(1k-1N)(i)(j)(k)ψ1(-2κt)2eπ1kifi<j<k

J.3 Max Potential ψ Satisfies (9)

First, note that limzf(z)/z=1, and therefore, the final value condition is satisfied.

limt0ψ(x,t) =1Nx,𝟙+i=1N-11i(i+1)((j=1ix(j))-ix(i)+1)
=1Nx,𝟙+j=1N-1x(j)(i=jN-11i(i+1))-i=1N-1x(i)+1i+1
=1Nx,𝟙+j=1N-1x(j)(1j-1N)-i=2Nx(i)i
=1Nx,𝟙+x(1)-(j=1N-1x(j)N)-x(N)N
=x(1)=maxi(xi)

Since x(1)x(2)x(N), we have 0z1z2zN-1 and, therefore, 2πf′′(z1)f′′(z2)f′′(zN-1)0. This by a straightforward computation gives for iN-1,

(i)(i)ψ-(i+1)(i+1)ψ=(1-1i)(f′′(zi-1)-f′′(zi))0

Therefore, maxii2ψ=(1)2ψ=(2)2ψ. Finally, ψt=-κ-2tl=1N-1clf′′(zl) and thus ψt+κ(1)2ψ=0.

K Lower Bound Max Potential: Diffusion Factor

The linearity in the direction of 𝟙 implies that iiψ=1, and therefore D2ψ𝟙=0. Suppose x(1)=xi, in Appendix J.3 we show that maxjj2ψ=i2ψ. Therefore, ±D2ψqm,qm=±D2ψ±(qm,qm)=4iiψ=4maxjjjψ.

L Upper Bound Max Potential: Diffusion Factor κm

We note that since f is convex, ψ is convex. Therefore, maxq[-1,1]ND2ψq,q is attained at the vertices of the hypercube {±1}N. Without loss of generality we assume x1x2xN. From Appendix J, we see that D2ψ has a special structure: ikψ=jkψ for all i,j<k and ijψikψ0 for i<j<k. In the remainder of this Appendix we use this structure to prove that a class of simple rank-based strategies maximizes the quadratic form maxq{±1}ND2ψq,q,1414 14 This class includes the comb strategy. and compute the κm such that

12maxq{±1}ND2ψq,qκmmaxiiiψ

From Appendix J.1 we know that for i<j ijψ is a function of j alone, thus we denote aj=-ijψ for any i<j. Also,

ijψ=1-2κt(l=jN-1clf′′(zl)-(j-1)cj-1f′′(zj-1))1-2κtf′′(zj)-f′′(zj-1)j0

and for i<j<k

ijψ-ikψ= 1-2κt((l=jk-1clf′′(zl))-(j-1)cj-1f′′(zj-1)+(k-1)ck-1f′′(zk-1))
1-2κt(f′′(zj)(1j-1k)-f′′(zj-1)j+f′′(zj)k)0

thus a2a3aN0.

Theorem 7

For the max potential ψ on {x|x1x2xN}, maxq{±1}ND2ψq,q is obtained by strategies satisfying q2i-1+q2i=0, 2iN. Specifically, comb strategy qc achieves the maximum.

Proof  As shown in Appendix H, we can view D2ψ as the Laplacian of an undirected weighted graph G with N vertices. The edge weight wij=-(i)(j)ψ=aj for i<j and a2a3aN. Also, as shown

maxq{±1}ND2ψq,q=4max_cut(G)

Thus, we converted the problem of maximizing a quadratic to the problem finding the max cut for a special weighted graph. The Theorem proved below gives us the desired result.  


Theorem 8

Consider an undirected graph with vertices {1,,N} satisfying for any edge (i,j) the weight depends on max(i,j), i.e. we can write wij=aj for i<j. Also suppose a2a3aN , then the max cut, modulo permutations between vertices (i,j) such that ai=aj, is any cut dividing 2i-1 and 2i for all 1iN2.

Proof  Without loss of generality, assume a2>a3>aN. We use induction on N. For N=2 and N=3 it is straight forward to check that the max cut is any cut dividing 1 and 2.

For N+1 points, we first prove the max cut must divide 1 and 2.

Lemma 9

Any max cut must divide 1 and 2.

Proof [Proof of lemma 9] Assume a max cut doesn’t divide 1 and 2, denote

{L={i{3,,N}|i on the same side as 1 and 2}R={i{3,,N}|i on the other side}

by definition R is nonempty.

Define AL=jLaj and AR=jRaj. If AR<AL+a2 then by moving 2 to R the cut will get bigger since

T({1}L,{2}R)=T({1,2}L,R)+a2+AL-AR>T({1,2}L,R)

which is a contradiction.

So ARAL+a2. We denote pi={2i-1,2i}, 2iN2. If no pi satisfies piR, then

AR-AL(a3-a4)+(a5-a6)+<a2

which is a contradiction. Thus, we can assume pk is the smallest set contained in R. We prove that by moving 2 to R and 2k-1 to L the cut will get bigger. Actually

T({1,2k-1}L,{2}R{2k-1})= T({1,2}L,R)+a2+(|R2k-1|-|L2k-1|-1)a2k-1
+AL2k-1-AR2k-1

where

{L2k-1=L{3,,2k-1}R2k-1=R{3,,2k-1}

and AL2k-1, AR2k-1 are defined under the same convention as AL, AR.

By definition of k for any pi such that 2ik-1, if one of the element is in R2k-1 then the other must be in L2k-1. Suppose R2k-1 contains elements of pi1,,pi|R2k-1|-1 and 2k-1, then

a2+(|R2k-1|-|L2k-1|-1)a2k-1+AL2k-1-AR2k-1
a2+j=1|R2k-1|-1a2ij-j=1|R2k-1|-1a2ij-1-a2k-1 (22)
 +(lL2k-1j=1|R2k-1|-1pijal) (23)
 -(|L2k-1|-|R2k-1|+1)a2k-1 (24)

We can rearrange the sum in (22)

a2+j=1|R2k-1|-1a2ij-j=1|R2k-1|-1a2ij-1-a2k-1
= (a2-a2i1-1)+(a2i1-a2i2-1)+(a2i|R2k-1|-1-a2k-1)>0

Also notice that each al>a2k-1 for lL2k-1j=1|R2k-1|-1pij and

|L2k-1j=1|R2k-1|-1pij|=|L2k-1|-|R2k-1|+1

which implies that (23) plus (24) is positive. This demonstrates that the new cut is strictly bigger which is a contradiction.  


Returning to the proof of Theorem 8, denote

Si={j