Predictive Online Convex Optimization

  • 2019-11-29 16:48:51
  • Antoine Lesage-Landry, Iman Shames, Joshua A. Taylor
  • 0

Abstract

We incorporate future information in the form of the estimated value offuture gradients in online convex optimization. This is motivated by demandresponse in power systems, where forecasts about the current round, e.g., theweather or the loads' behavior, can be used to improve on predictions made withonly past observations. Specifically, we introduce an additional predictivestep that follows the standard online convex optimization step when certainconditions on the estimated gradient and descent direction are met. We showthat under these conditions and without any assumptions on the predictabilityof the environment, the predictive update strictly improves on the performanceof the standard update. We give two types of predictive update for variousfamily of loss functions. We provide a regret bound for each of our predictiveonline convex optimization algorithms. Finally, we apply our framework to anexample based on demand response which demonstrates its superior performance toa standard online convex optimization algorithm.

 

Quick Read (beta)

Predictive Online Convex Optimization\thanksreffootnoteinfo

[    [    [
Abstract

We incorporate future information in the form of the estimated value of future gradients in online convex optimization. This is motivated by demand response in power systems, where forecasts about the current round, e.g., the weather or the loads’ behavior, can be used to improve on predictions made with only past observations. Specifically, we introduce an additional predictive step that follows the standard online convex optimization step when certain conditions on the estimated gradient and descent direction are met. We show that under these conditions and without any assumptions on the predictability of the environment, the predictive update strictly improves on the performance of the standard update. We give two types of predictive update for various family of loss functions. We provide a regret bound for each of our predictive online convex optimization algorithms. Finally, we apply our framework to an example based on demand response which demonstrates its superior performance to a standard online convex optimization algorithm.

\savesymbol

AND

ucb]Antoine Lesage-Landry, is]Iman Shames, ut]Joshua A. Taylor

aEnergy & Resources Group, University of California, Berkeley, USA 

bDepartment of Electrical and Electronic Engineering, University of Melbourne, Parkville, Australia 

cThe Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada 


Key words:  Convex optimization; learning algorithms; machine learning; power systems; renewable energy systems; load dispatching

 

11footnotetext: This work was partly done while A. Lesage-Landry was visiting The University of Melbourne, Australia. This work was funded by the Fonds de recherche du Québec – Nature et technologies, the Ontario Ministry of Research, Innovation and Science and the Natural Sciences and Engineering Research Council of Canada. This paper was not presented at any IFAC meeting. Corresponding author A. Lesage-Landry Tel.: +1 416-978-6842; fax: +1 416-978-1145

1 Introduction

Online convex optimization (OCO) has found applications in fields like network resource allocation [8, 7, 6] and demand response in power systems [20, 21]. It is used for sequential decision-making when contextual information or feedback is only revealed to the decision maker at the end of the current round. Theoretical results showing that OCO algorithms have bounded regret guarantee the performance of these algorithms under mild assumptions.

In many applications, the decision maker has access to both revealed past information and estimates about future rounds. For example, in power systems, weather forecasts or historical load patterns can be used to estimate the future regulation needs [22, 4]. In this work, we present the predictive online convex optimization (POCO) framework. POCO works under the assumption that an estimate of the gradient of the loss function for the next round is available to the decision maker. In POCO, a standard OCO update is first applied using past information to compute the next decision. Then, the decision maker checks the quality of the estimated information available to them. If the estimated gradient is considered accurate enough, the decision maker implements an additional projected gradient step based on the estimated gradient to improve their decision for this round. This last step is referred as the predictive update.

We introduce explicit criteria for determining if the quality of the estimated gradient is high enough to guarantee an improvement over a standard OCO step when the predictive update is applied. A regret bound is obtained for all our algorithms. We conclude this work by presenting numerical examples where a POCO algorithm is used to improve on the performance of demand response with standard OCO. This example is motivated by the fact that a load aggregator often has access to an estimate of the power imbalance they have to counteract for regulation purposes.

Literature review. Recent work in online convex optimization has focused on including prior or future information. Reference [28], which builds on [9], assumes that the problem’s unknown and uncertain parameters follow a predictable process plus some noise [27] for their OCO algorithm. As in our setting, a second update with an estimated gradient-like term follows a mirror descent update. This second update is used by the algorithm in every step regardless of the quality of the estimated gradient. For this reason, the algorithm is referred to as optimistic. Optimistic algorithms were also studied in [31, 23, 34]. No conditions are provided about the estimated gradient in this case except that it comes from past observations and/or side information via an oracle. The authors of [28] show that the optimistic mirror descent can lead to a tighter bound than a standard online mirror descent algorithm if the process is indeed predictable. In [19], the authors provide a dynamic regret bound for the optimistic mirror descent. There is, however, no guarantee that in a given round the optimistic update does not do worse than the standard OCO update. An algorithm similar to [28] is given in [18]. In their work, they make the stronger assumption in which the exact gradient of the next round loss function is available and then provide a static regret bound for their setting. This differs from our setting in that we provide dynamic regret-bounded algorithms and use an estimated gradient which entails less restrictive assumptions. Several other authors have studied different ways to incorporate future information in OCO like using state information [17] or the direction of the loss function’s gradient in an online linear optimization setting [10].

The projected gradient descent, inexact gradient descent, and proximal algorithms [1, 29, 2] from conventional convex optimization resemble our setting. These algorithms differ from ours because they aim to minimize the same objective function throughout all descent steps. In OCO, we minimize a sequence of objective functions {ft}t=1T and at each time t provide a decision to minimize the current loss function. The loss function in a given round is only observed after we have committed to a decision. OCO will be introduced formally in Section 2.

Model predictive control (MPC) [14, 3] is another widely-used sequential decision-making framework. In MPC, the decision maker solves to optimality a receding horizon optimization problem that relies on models of future round loss functions. This thus requires significantly more contextual information and computational resources. These limitations are absent in OCO, making it a more suitable tool for real-time decision making with small computational resources.

Because we characterize conditions under which the predictive step improves performance, we guarantee improvement over conventional OCO and require no predictability assumptions. These conditions can be checked at each round of OCO, and if satisfied, the predictive update is implemented. In sum, in this work we make the following contributions:

  • We introduce a novel predictive online convex optimization framework and provide conditions for when to use side information.

  • We propose a predictive update with a predetermined step size for loss functions that have a Lipschitz gradient. We show that this update leads to a strict improvement over an OCO update when used (Section 4).

  • We give a predictive update with backtracking line search that applies to a broader family of problems. We show that it leads to strict improvement over an OCO update (Section 5).

  • We obtain sublinear regret bounds in the number of rounds for all algorithms.

  • We apply our framework to demand response in power systems and find that it outperforms a standard OCO algorithm (Section 6).

2 Background

In OCO, one must make a decision at each round to minimize their cumulative loss [30, 15]. The current round’s loss function and any other round-dependent parameters are not available at the moment when the decision is made. Only information about previous rounds can be used to make the decision. Once the decision has been made, information about the current round is observed.

Let t denote the current round index and T be the time horizon. Let 𝒳N, N, be the decision set, and let 𝐱t𝒳 be the decision variable at time t. We denote the differentiable convex loss function by ft(𝐱t) for t=1,2,,T. Let be the Euclidean norm. We denote the projection operator onto the set 𝒴 as proj𝒴(𝐱)argmin𝐲𝒴𝐱-𝐲.

The goal of the decision maker is to sequentially solve the following sequence of problems:

min𝐱t𝒳ft(𝐱t) (1)

for t=1,2,,T. The decision maker observes the loss function ft after choosing 𝐱t. For this reason, even if the loss function has a simple form, an analytical solution to the round optimization problem (2) is not obtainable. The decision 𝐱t is computed using a gradient descent-based [35], mirrored descent-based [13] or Newton step-based rule [16]. For example, in the online gradient descent (OGD[35] algorithm, the decision at round t+1, 𝐱t+1, is given by the update:

𝐱t+1=proj𝒳(𝐱t-ηft(𝐱t)), (2)

where ηT-1/2 to guarantee a sublinear upper bound on the dynamic regret of OGD [35, Theorem 2].

Throughout this work, we make the following assumptions [35, 30, 15].

Assumption 1

The set X is convex and compact.

The decision set 𝒳 represents all constraints on 𝐱t. In this version of OCO, we only consider time-invariant constraints.

Assumption 2

The loss function is B-bounded: |ft(xt)|B for t=1,2,,T and B<.

Assumption 3

The gradient of the loss function is G-bounded: ft(xt)G for t=1,2,,T and G<.

As a consequence of Assumption 1, the decision variable is also X-bounded: 𝐱tX for t=1,2,,T. We define the diameter of the compact set 𝒳 as diam𝒳=sup{𝐱-𝐲|𝐱,𝐲𝒳}and let D=diam𝒳, a positive scalar. The remainder of the assumptions will be stated when a specific technical result requires it.

The design tool of OCO algorithms is the regret [30, 15]. In this work, we use the dynamic regret [8, 19, 35, 24]:

𝚁𝚎𝚐Td=t=1Tft(𝐱t)-ft(𝐱t), (3)

where 𝐱targmin𝐱𝒳ft(𝐱). The dynamic regret compares the loss suffered by the decision maker to optimal performance in each round. Other versions of the regret exists, e.g., static regret [30, 15, 35], which is defined in terms of the optimal stationary decision, 𝐱argmin𝐱𝒳t=1Tft(𝐱) in (2). In this work, we only consider the dynamic regret because it yields a stronger theoretical guarantee. This theoretical guarantee is also more relevant in the context of time-varying optimization. For this reason, we refer to the dynamic regret, 𝚁𝚎𝚐Td, simply as the regret. Note that a bounded dynamic regret implies a bounded static regret [8]. The goal when designing an OCO algorithm is to show that the regret is sublinearly bounded above in the number of rounds. An OCO algorithm with a sublinearly bounded dynamic regret in the number of rounds will on average perform as well as the round optimal decision at each round [30, 15, 16].

We conclude this section by defining the quantity VT=t=2T𝐱t-𝐱t-1. The term VT quantifies the variation of the optimal predictions through all rounds.

3 Predictive OCO

We now introduce our POCO framework. We let 𝐱¯t+1𝒳 be the decision computed by an OCO algorithm in round t. This OCO algorithm can be, for example, the aforementioned OGD. The decision 𝐱¯t+1 is then given by the update (2). In POCO, we consider an ϵ-forecaster introduced in Assumption 4. Let 𝐠t(𝐱¯t+1)N be the estimated gradient of the loss function ft+1 at 𝐱¯t+1.

Assumption 4 (ϵ-forecaster)

The ϵ-forecaster has access to an estimate of the gradient of the next round’s loss function evaluated at x¯t+1, gt(x¯t+1), and the maximum estimation error, ϵ>0, such that gt(x¯t+1)-ft+1(x¯t+1)ϵ for all time t=1,2,,T.

In other words, we consider a forecaster that has access to limited information about the next round in the form of 𝐠t(𝐱¯t+1). This could represent a prediction based on historical data, e.g., a weather or demand forecast. The decision maker uses this information to improve on the OCO update. For conciseness, we denote the estimated gradient by 𝐠t. We omit its dependency on 𝐱¯t+1 because it is always evaluated at the OCO update output, 𝐱¯t+1, and no other points. The decision maker can meet Assumption 4 by relying on an exogenous model to estimate the gradient ft+1(𝐱¯t+1). In the context of demand response, historical data of the load’s consumption and generator output’s patterns, weather history and the historical values of the gradient, for example, can be used to build a statistical model to estimate the value of the ft+1 at the decision given by OCO update. The parameter ϵ can then be set according to, for example, a high confidence interval or a worst-case performance parameter. The forecaster would then provide 𝐠t using this model.

Then, if certain conditions are met, the following update rule for our proposed POCO algorithm is used.

Definition 1 (Predictive update)

Let βt>0 be an appropriately chosen step size. The predictive update is

𝐱t+1=proj𝒳(𝐱¯t+1-βt𝐠t). (4)

The predictive update is to be used directly after the OCO update and will lead to a strict improvement over the OCO update under certain conditions. The aforementioned conditions will be discussed in the next sections and depend on the properties of the loss function. If the conditions are not met, 𝐱¯t+1 is directly used. Let δ>0 be the desired improvement when using the predictive update. We define the counter ct:

ct+1={ct+1if 𝐱t+1-𝐱¯t+1δctotherwise

with c0=0. The variable ct represents the number of predictive updates as described in Definition 1. Let ν=cT/T be the ratio of rounds using the predictive update to the total number of rounds.

Depending on the loss function, any regret-bounded OCO update can be used in the POCO framework. Back to the OGD example, the predictive OGD uses the update (2) and if certain conditions are met, 𝐱t+1 is provided by (1) and if not, 𝐱t+1=𝐱¯t+1. We write 𝐱t+1(βt)=proj𝒳(𝐱¯t+1-βt𝐠t) as a function of the step size βt>0 and let 𝐝t+1=𝐱t+1(βt)-𝐱¯t+1 be the descent direction.

Next, we provide sufficient conditions for the estimated gradient 𝐠t to be a feasible descent direction. Later, we consider the step size selection problem. Particularly, two cases are considered where (i) the step sizes are constant and chosen a priori based on a property of the loss functions, or (ii) the step sizes are selected through the application of a backtracking line search that enforces a modified online version of the Armijo condition [2].

The following lemma introduces a sufficient condition for the estimated gradient 𝐠t to be a descent direction of the OCO problem (2).

Lemma 1 (Estimated descent direction)

The vector -gt provided by the ϵ-forecaster is a descent direction for ft+1(x¯t+1) if gt>ϵ.

The proof of Lemma 1 is presented in Appendix A. The next lemma is adapted from [2] and ensures that the predictive step follows a feasible descent direction.

Lemma 2 (Feasible estimated descent direction)

For all βt>0 and x¯t+1X, if gt>ϵ and xt+1(βt)x¯t+1 , then xt+1(βt)-x¯t+1 is a feasible descent direction at x¯t+1 and gt(xt+1(βt)-x¯t+1)-1βtxt+1(βt)-x¯t+12.

Similarly, the proof of Lemma 2 is given in Appendix B.

4 POCO with fixed step size

We now present a predictive update where step sizes βt are fixed and based on a propriety of the sequence of loss functions. We conclude this section by providing regret-bounded algorithms using these updates. In this section, we add the following assumption:

Assumption 5

Let L<. The loss function ft(x) has an L-uniformly Lipschitz-continuous gradient: ft(x)-ft(y)Lx-y for all t=1,2,,T and x,yX.

We propose a predictive update with fixed step size next. We state sufficient conditions that guarantee a strict improvement over an OCO update. These sufficient conditions can be checked at each round to determine if the estimated information is accurate enough, and therefore if the predictive update should be used in the current round.

Lemma 3 (Predictive update with fixed step size)

Suppose that Assumption 5 holds and gt>ϵ. If β1L and dt+1=projX(x¯t+1-βtgt)-x¯t+1ϵL+ϵ2L2+2δL, then the predictive update (1) used by the ϵ-forecaster strictly improves on the OCO update and the improvement is bounded below by δ>0.

The proof of Lemma 3 is provided in Appendix C. We now present regret bounds for POCO algorithms. This algorithm uses the predictive update with fixed step size to improve the performance of OCO algorithms.

Theorem 1 (POCO regret bound)

Consider an OCO algorithm with a sublinear regret upper bound. Suppose that the forecaster uses the predictive update (1) only at rounds t when the estimated gradient gt and feasible descent direction dt+1=projX(x¯t+1-βtgt)-x¯t+1 satisfy the assumptions of Lemma 3. If the ratio of rounds satisfying these assumptions is greater than ν, then the regret of the POCO algorithm is bounded above by

RegTd(POCO)RegTd(OCO)-Tνδ.

The proof of Theorem 1 is presented in Appendix D. This theorem leads to the following corollary which provides a regret bound for the OGD with predictive updates (POGD).

Corollary 1 (O(T) regret bound for POGD)

Suppose that the ratio ν of rounds that respects the assumptions of Lemma 3 is ν>1T. Then the predictive OGD algorithm’s regret is bounded above by

RegTd(𝙿𝙾𝙶𝙳) RegTd(𝙾𝙶𝙳)-δT,
=(7X24+G22+XVT-δ)T,

which is sublinear and tighter than the OGD regret bound.

The corollary follows from substituting 𝚁𝚎𝚐Td(OGD) from [35] and ν>1/T in Theorem 1.

5 POCO with backtracking line search

In this section, we do not require Assumption 5 to hold. We however use the following proposition:

Proposition 1

The loss function ft(x) is Δ-time-Lipschitz with Δt(x),Δ<, that is:

|ft(𝐱)-ft+τ(𝐱)|Δt(𝐱)|τ|Δ|τ|

for all τ{iZ|0t+iT} at all t=1,2,,T, and all xX.

Proposition 1 always holds because Assumption 2 implies that Δ=2B/|τ| is sufficient. Under Proposition 1, we consider functions that are (t,𝐱)-locally and globally Lipschitz in their time argument, respectively, for the intermediary bound (Δt(𝐱)) and the upper bound (Δ). This can represent, for example, loss functions like squared tracking error functions, in which the time-varying targets are always contained in a closed set.

In the case of the POCO with backtracking (POCOb), we re-express the update (1) given in Definition 2. The backtracking line search for predictive update is given in Algorithm 2.

Definition 2 (POCOb update)

Let ζ be a positive scalar and βm be determined by a backtracking line search algorithm. The predictive update with backtracking line search is

𝐱t+1=𝐱¯t+1+βm(proj𝒳(𝐱¯t+1-ζ𝐠t)-𝐱¯t+1) (5)
\@float

algorithm[tb] \[email protected]@algorithmic[1] \STATEParameters: Given β(0,1) and M. \STATEInitialization: Set ζ>0.

\STATE

𝐝t+1=proj𝒳(𝐱¯t+1-ζ𝐠t)-𝐱¯t+1 \STATEm=0. \WHILEft(𝐱¯t+1+βm𝐝t+1)>ft(𝐱¯t+1)+βm(𝐠t𝐝t+1-ϵ𝐝t+1)-2Δ and mM \STATEm=m+1.

\ENDWHILE\IF

m>M

\STATE

β=0.

\ENDIF

Backtracking algorithm for predictive gradient projection

The next lemma shows that the backtracking line search-based predictive update improves on the OCO update. Our claim relies on the modified Armijo condition for gradient projection. This condition ensures a sufficient decrease in the objective when using an estimated gradient projection descent direction [33]. We adapt this condition to the estimated gradient and online setting. The modified Armijo condition for gradient projection [2] on ft+1 and feasible descent direction 𝐝t+1=proj𝒳(𝐱¯t+1+ζ𝐠t)-𝐱¯t+1 for some ζ>0 with step size βm is given by:

ft+1(𝐱¯t+1+βm𝐝t+1) ft+1(𝐱¯t+1) (6)
+βmft+1(𝐱¯t+1)𝐝t+1.
Lemma 4 (Sufficient decrease of POCOb update)

Suppose gt>ϵ. If Algorithm 2 terminates to a step size βm>0, then the predictive update with backtracking line search (5) used by the ϵ-forecaster satisfies the modified Armijo condition (6), and leads to a sufficient decrease in the loss function.

The proof of the lemma can be found in Appendix E.

Remark 1

Algorithm 2 ensures that when β0, βm satisfied:

ft(𝐱¯t+1+βm𝐝t+1) ft(𝐱¯t+1)+βm𝐠t𝐝t+1
-βmϵ𝐝t+1-2Δ (7)

Every element of (7) is available at time t, which is not the case in (6). This allows us to use a backtracking line search algorithm to determine βt in an OCO setting. Algorithm 2 also ensures that the step size is not too small (cf. [33, Section 3.1]).

Note that there is an additional ϵ𝐝t+1 term in the modified Armijo condition for estimated gradient projection. This is a consequence of not having access to the exact gradient of ft. Hence, to ensure that the update is valid, the modified Armijo condition is augmented by a term proportional to the error of the estimated gradient. The second additional term, 2Δ, is due to the time-varying setting of OCO.

We now discuss the existence of step sizes βt that satisfy (7) at round t. Before stating the main result, for a given 𝐱¯t+1 and 𝐠t, define the set of step sizes that comply with line 5 in the line search algorithm, which is the modified Armijo condition for online settings (7):

𝒮={β>0| ft(𝐱¯t+1+β𝐝t+1)ft(𝐱¯t+1)
+β𝐠t𝐝t+1-βϵ𝐝t+1-2Δ}.
Theorem 2

Suppose dt+1=projX(x¯t+1-ζgt)-x¯t+10 is a feasible descent direction and ft is bounded below for all t. Then there exists xX such that ft(x¯t+1)-ft(x)>2Δ if and only if S.

The proof for Theorem 2 is presented in Appendix F. We note that Theorem 2 does not guarantee that the backtracking algorithm, Algorithm 2, will find a non-zero step size. Other techniques like exact line searches, might be required to identify an adequate step size in some problem instances. Using Theorem 2, we can provide a lower bound on the improvement of the predictive update with backtracking line search.

Corollary 2 (POCOb update improvement)

Suppose that the assumptions of Lemma 4 hold and βm>0, then the predictive update with backtracking line search improves on the OCO update by a minimum of 2Δ.

The proof of Corollary 2 is given in Appendix G. We now state a regret bound for the POCOb algorithm.

Theorem 3 (POCOb regret bound)

Consider an OCO algorithm with bounded regret. Suppose that the assumptions of Lemma 4 are met. If the ratio of rounds with β>0 and satisfying these assumptions to T is greater than ν, then the regret of the POCO algorithm with backtracking used by the ϵ-forecaster is bounded above by

RegTd(POCOb)RegTd(OCO)-2TνΔ (8)

and thus outperforms the OCO algorithm.

The proof of Theorem 3 is presented in Appendix H.

Remark 2

Note that if the locally Lipschitz statement of Proposition 1 is used, then 2Δ is replaced by Δt,1(x¯t+1+βmdt+1)+Δt,1(x¯t+1) in the modified Armijo condition for online settings (7), and the bound (8) can be recomputed accordingly.

6 Example

In this section, we apply POCO algorithms to demand response (DR) in power systems [5, 26], specifically regulation and curtailment. At each time step, a DR aggregator sends instructions to their loads to follow a regulation signal, e.g., a power imbalance due to a sudden change in renewable power generation [4, 32]. Each load responds to the signal by adjusting its power consumption. The power consumption is constrained by a storage capacity, which could represent physical storage like a battery or the load’s limits, e.g., thermal constraints. The regulation signal is unknown at the time the DR instructions are sent. This can be due, for example, to a drop in renewable power generation which is only assessed after the generator has committed to some amount of power. The objective of the DR aggregator is, therefore, to predict the DR dispatch at each time instance. This problem can be formulated as POCO, in which an estimate of the regulation signal is available to the load aggregator.

We consider N loads. Let 𝐱tN denote the decision variable at round t. The variable 𝐱t represents the instructions sent to the loads. Let rt be the regulation signal at time t. Let 𝐱¯,𝐱¯N be the maximum and minimum power that can be consumed or delivered for all loads. Define the decision set 𝒳={𝐱N|𝐱¯𝐱𝐱¯}. We let 𝐬tN denote the state of charge vectors of the loads at time t and 𝐜N the vector of load energy capacities. The state of charge of a load i at time t is st(i)=s0(i)+n=1txn(i). In the current case, we assume that there is no leakage nor energy losses. The OCO problem takes the following form:

min𝐱t𝒳(rt-𝟏𝐱t)2+σ𝐬t-1+𝐱t-𝐜22. (9)

The loss function has two terms: (i) a regulation term where the aggregated loads are dispatched to follow a regulation signal rt and (ii) a state of charge objective added to keep the loads near half their energy capacity. The loss function given in (9) is σ-strongly convex. For this reason we use the OGD for strongly convex functions (σOGD) proposed in [24], which offers tighter regret bound than the standard OGD. The following corollary gives an upper bound on the regret of predictive OGD for strongly convex function (σPOGD).

Corollary 3 (POGD for strongly convex functions)

Suppose ft is σ-strongly convex and satisfies Assumption 5 for all t. Consider the σOGD update

𝐱¯t+1 =𝐱t+η(proj𝒳(𝐱t-1γft(𝐱t))-𝐱t)

where η(0,1] and 0<γL. Then, the σPOGD with fixed step size, given that the assumptions of Lemma 3 hold for a ratio of the total rounds greater than ν, has a regret bounded above by

RegTd(σ𝙿𝙾𝙶𝙳) RegTd(σ𝙾𝙶𝙳)-Tνδ,
O(VT+1)-Tνδ.

The result follows from the proof of Theorem 1 and the σOGD regret bound from [24]. We now present simulation results. All optimizations are solved using CVXPY [11] and the ECOS [12] solver.

Fixed step size example.

The load and algorithm parameters for this example are gathered in Table 1. The initial state of charge of each load is set to half its capacity. The regulation signal is rt=0.2sin(2πTt)+wt. The parameter wtN(0,0.01) is a Gaussian noise used to model sudden changes. We assume that the aggregator has access to estimated gradient for different level of accuracy ϵ. This represents, for example when ϵ=0.01, a relative error of at least 4% of the actual gradient norm. The parameter σ is set to achieve adequate regulation performance without deviating too much from each load’s desired state of charge.

Table 1: Parameters for POCO simulations
Parameter Value Unit
N 25 loads
h 30 seconds
𝐱¯/h Uniform[1,3] kW
𝐱¯/h -𝐱¯ kW
𝐜 Uniform[10,15] kWh
ϵ 0.1, 0.05 & 0.01
δ 10-6
σ 0.005
η 1
γ L
β 1/L
Fig. 1: Regret comparison between POCO with fixed step size, OCO, and OMD (log scale, note: O(VT+1) and O(VT+1)-Tδν are superimposed when shown at this scale)

We now present the performance of our POCO algorithm with a fixed step size. We implement the OMD from [28] for comparison. This algorithm uses 𝐠t without validating the estimated information. Figure 1 shows an instance of the experimental regret for the POCO with three different values of ϵ, the conventional OCO algorithm, their respective regret bounds and OMD’s regret. POCO outperforms its bound, the OCO and the OMD algorithms. We remark that as expected the number of predictive updates increases with the accuracy of the estimated gradient, the performance of the POCO algorithm also improves.

Backtracking line search example.

We now present an example of POCO with backtracking. We consider a curtailment scenario. We let pt be the total power to be curtailed by the loads at time t for t=1,2,,T. When a contingency occurs in the network, flexible loads are called to curtail their power consumption, e.g., by temporarily shutting down their HVAC system. Contrary to the regulation case, the loads are not contracted to follow a setpoint and no penalties are assessed on loads curtailing more than asked. Similar to the regulation setting, the curtailment signal is unknown until immediately after the current round. This setting can be modeled as POCO where an estimated curtailment signal is available to the aggregator at each round. We use the same notation as the previous examples. Let []+=max{0,}. This curtailment scenario is modeled by loss function given below:

ft(𝐱t)=([pt-𝟏𝐱t]+)2+σα𝐬t-1+𝐱t-𝐜22 (10)

where we have added a recovery coefficient to the state of charge objective term used previously. This coefficient models the usual evolution of the load (e.g., ambient temperature heating for a thermostatic load). We let α=1.001. This is equivalent to a recovery coefficient of 1.13 per hour. The function ft given in (10) is not gradient Lipschitz and Assumption 5 does not hold. We model the curtailment signal to be quickly increasing at first and then slowly plateauing to represent new level of available generation. This event is assumed to be limited in time, after which the network goes back to its normal state and no curtailment is then required. We let pt=0.04t0.3+wt where wtN(0,0.01) for t=1,2,,T/4 and then pt=0.04(T/4)0.3+wt where wtN(0,0.001) for t=T/4,T/4+1,,T. The noise variance is equivalent to approximatively 10% of curtailment signal’s value at first and then about 1%.

Table 2: Different parameters for POCOb simulations
Parameter Value
α 1.001
ϵ 0.1, 0.01 & 0.001
M 100
σ 5×10-5
ζ 0.5
β 0.9
η 1/10T

We use the same parameters as in the previous section, except for those in Table 2. The POCOb experimental regret shown in Figure 2 is sublinear in the numbers of rounds and outperforms the OCO’s regret. While the performance is not as strong as POCO with fixed step size, this algorithm can be applied to a broader family of functions because it does not require the loss function to be gradient Lipschitz continuous. The difference in performances between the two POCO updates is explained by the fact that the sufficient conditions for the POCOb are rarely satisfied in this simulation. Improved estimation accuracy, ϵ, and a loss function with a lower maximum temporal change, Δ, could increase the number of times the backtracking line search is used. Nevertheless, the POCOb achieves a regret reduction of 29% when ϵ=1% over a standard OCO algorithm. Lastly, we note that POCOb performs better for larger variations in pt and smaller values of ϵ.

Fig. 2: Regret comparison between POCOb, OCO, and OMD (log scale, note: O(T(VT+1)) and O(T(VT+1))-Tδν are superimposed when shown at this scale)

7 Conclusion

In this work, we have presented the predictive online convex optimization framework. In POCO, a second update is used after the OCO update to improve performance using an estimated gradient. We have presented two versions of the predictive update that can be used under different assumptions. We have shown a regret upper bound for all of our POCO algorithms. We have applied POCO to demand response in electric power systems and found that they outperform conventional OCO using commonly available forecast information. In the case of fixed step size update, we observed an improvement of 95% in the final regret and of 29% in the backtracking case when having access to a (ϵ=0.01)-forecaster.

Acknowledgements

A.L.-L. thanks P. Mancarella for his support and for co-hosting him at The University of Melbourne.

A Proof of Lemma 1

Define 𝐞tn as 𝐞t=𝐠t-ft+1(𝐱¯t+1) where 𝐞tϵ by the definition of the ϵ-forecaster. A vector 𝐯 is a descent direction if 𝐯ft+1(𝐱¯t+1)<0. Thus, we have

0 <𝐠tft+1(𝐱¯t+1)
=𝐠t(𝐠t-𝐞t)

Equivalently, we have 𝐠t𝐠t>𝐠t𝐞t. Taking the norm of both sides and dividing by the norm of 𝐠t gives

𝐠t>𝐞tcosθ𝐠t,𝐞t, (A.1)

where θ𝐠t,𝐞 is the angle between 𝐠t and 𝐞t. By assumption, 𝐠t>ϵ and 𝐞tϵ. Therefore (A.1) always holds and we have proved the lemma.

B Proof of Lemma 2

The identity follows from [2, Proposition 6.1.1] with 𝐠t instead of the gradient of the loss function. It then follows from Lemma 1 that -βt𝐠t with βt>0 is a descent direction at 𝐱¯t+1. Thus, 𝐝t+1=𝐱t+1(βt)-𝐱¯t+1 is a feasible descent direction because 𝐝t+1𝒳 and 𝐠t𝐝t+1<0 for all t and 𝐱¯t+1𝒳.

C Proof of Lemma 3

By Assumption 5, ft+1 has an L-Lipschitz gradient. We use the following inequality from  [25, Theorem 2.1.5]

ft+1(𝐲) ft+1(𝐱)+ft+1(𝐱)(𝐲-𝐱) (C.1)
+L2𝐱-𝐲2

for all 𝐱,𝐲𝒳. We substitute 𝐲=𝐱t+1(β) and 𝐱=𝐱¯t+1 into (C.1) to obtain

ft+1(𝐱t+1(β)) ft+1(𝐱¯t+1)+L2𝐱t+1(β)-𝐱¯t+12
+ft+1(𝐱¯t+1)(𝐱t+1(β)-𝐱¯t+1).

For the reminder of the proof, we use 𝐝t+1=𝐱t+1(β)-𝐱¯t+1 to simplify the notation. We rewrite the gradient in term of the estimated gradient, which yields

ft+1(𝐱t+1(β)) ft+1(𝐱¯t+1)+𝐠t𝐝t+1-𝐞t𝐝t+1
+L2𝐝t+12. (C.2)

By assumption, 𝐱t+1(β)𝐱¯t+1, which ensures that Lemma 2 holds. We use Lemma 2 to upper bound the second term of the right-hand side of (C.2). We then have: ft+1(𝐱t+1(β))ft+1(𝐱¯t+1)-(1β-L2)𝐝t+12+ϵ𝐝t+1. Therefore, the predictive update with fixed step size will improve on the OCO update by a minimum of δ>0 if the following condition is satisfied:

1β𝐝t+12-L2𝐝t+12-ϵ𝐝t+1δ. (C.3)

Assuming 0<β1L, then 1βL, and if L2𝐝t+12-ϵ𝐝t+1δ, then (C.3) also holds for any β]0,1L]. Solving for the norm of the feasible descent direction 𝐝t+1, we have

𝐝t+1=𝐱t+1(β)-𝐱¯t+1ϵL+ϵ2L2+2δL. (C.4)

Thus, by setting 0<β1L and satisfying (C.4), we obtain ft+1(𝐱t+1(β))ft+1(𝐱¯t+1)-δ, where δ>0. This implies that the predictive update strictly improves over the OCO update when the feasible descent direction satisfies the condition (C.4). The improvement is bounded below by δ.

D Proof of Theorem 1

Let 𝐱^t denote the decision variable with βt=0 for all t. In other words, 𝐱^t represents the decision variable computed without the predictive algorithm. Denote the set of assumptions of Lemma 3 at round t by 𝒜t. Let 𝕀𝒜t be the indicator function where 𝕀𝒜t=1 if the assumptions are satisfied and 0 otherwise. Observe that the improvement, it, is given by

it𝕀𝒜t=ft(𝐱^t)-ft(𝐱t), (D.1)

where it is the improvement when 𝕀𝒜t=1. The regret of the POCO algorithm is

𝚁𝚎𝚐Td(POCO)=t=1Tft(𝐱t)-ft(𝐱t). (D.2)

Using (D.1), we re-express ft(𝐱t) in (D.2):

𝚁𝚎𝚐Td(POCO) =t=1Tft(𝐱^t)-ft(𝐱t)-t=1Tit𝕀𝒜t
=𝚁𝚎𝚐Td(OCO)-t=1Tit𝕀𝒜t (D.3)

By Lemma 3, the improvement it is bounded below by δ. We rewrite (D.3) as 𝚁𝚎𝚐Td(POCO)𝚁𝚎𝚐Td(OCO)-t=1Tδ𝕀𝒜t. A minimum of Tν rounds satisfy 𝒜t and hence 𝚁𝚎𝚐Td(POCO)𝚁𝚎𝚐Td(OCO)-Tνδ.

E Proof of Lemma 4

We show that for some step size βm, the estimated gradient descent projection leads to a sufficient decrease thus outperforming the OCO update. We show that if βm satisfies (7), then it also satisfies satisfies (6), ensuring a sufficient decrease in the objective function. Note that (7) is the condition under which the backtracking algorithm, Algorithm 2, is used. We can see from the left-hand side of the condition (7) that the update improves over the OCO update because the three last terms are bounded above by 0, i.e., 𝐠t𝐝t+1-1ζ𝐝t+12<0 by Lemma 2. Thus all three terms are less or equal to zero. By assumption, β>0, and these terms are also bounded away from zero since 𝐱t+1𝐱¯t+1.

We start from (7) and shows it implies (6). By assumption, 𝐞tϵ for all t and hence (7) implies,

ft(𝐱¯t+1+βm𝐝t+1) ft(𝐱¯t+1)+βm(𝐠t-𝐞t)𝐝t+1
-2Δ.

Rearranging the terms, we have

ft(𝐱¯t+1+βm𝐝t+1)+Δ ft(𝐱¯t+1)-Δ (E.1)
+βmft+1(𝐱¯t+1)𝐝t+1.

By assumption, ft+1 is time-Lipschitz with constant Δ>0 for all 𝐱𝒳 and all t. We can therefore bound below and above respectively the left-hand and right-hand side of (E.1). This leads to

ft+1(𝐱¯t+1+βm𝐝t+1) ft+1(𝐱¯t+1)
+βmft+1(𝐱¯t+1)𝐝t+1,

the modified Armijo condition (6).

F Proof of Theorem 2

Assume ft(𝐱¯t+1)-ft(𝐱)>2Δ. This assumption implies that ft+1(𝐱¯t+1)-ft+1(𝐱)>0 by Proposition 1. Thus, 𝐱¯t+1 is not the minimum point of ft+1. It follows that ft+1(𝐱¯t+1)0. By assumption, 𝐝t+1𝟎 is a feasible descent direction and we have

ft+1(𝐱¯t+1)𝐝t+1<0. (F.1)

Let a(0,1). Subtracting ft(𝐱¯t+1+aβ𝐝t+1)𝐝t+1 on both side of (F.1) we obtain,

(ft+1(𝐱¯t+1)- ft(𝐱¯t+1+aβ𝐝t+1))𝐝t+1< (F.2)
-ft(𝐱¯t+1+aβ𝐝t+1)𝐝t+1.

If the following condition holds, then (F.2) also holds:

ft+1(𝐱¯t+1) -ft(𝐱¯t+1+aβ𝐝t+1)𝐝t+1< (F.3)
-ft(𝐱¯t+1+aβ𝐝t+1)𝐝t+1.

Under Assumption 3, for all 𝐱,𝐳𝒳 we have

ft+1(𝐱)-ft(𝐳) ft+1(𝐱)+ft(𝐳)2G

and by Assumption 1, we have 𝐝t+1D. Then, if

2GD<-ft(𝐱¯t+1+aβm𝐝t+1)𝐝t+1 (F.4)

holds, so does (F.3). We rewrite (F.4) as

ft(𝐱¯t+1+aβ𝐝t+1)𝐝t+1<-2GD (F.5)

Recalling Taylor’s Theorem [33, Theorem 2.1]:

ft(𝐲+𝐩)=ft(𝐲)+ft(𝐲+a𝐩)𝐩,

where 𝐲,𝐱𝒳, 𝐩n and for some a(0,1). We let 𝐲=𝐱¯t+1 and 𝐩=βm𝐝t+1. We have,

ft(𝐱¯t+1+β𝐝t+1) =ft(𝐱¯t+1) (F.6)
+βft(𝐱¯t+1+aβ𝐝t+1)𝐝t+1.

We bound above the the last term of (F.6) using (F.5) and obtain

ft(𝐱¯t+1+β𝐝t+1)<ft(𝐱¯t+1)-2βGD (F.7)

By setting βΔGD in (F.7), we have ft(𝐱¯t+1+β𝐝t+1)<ft(𝐱¯t+1)-2Δ. This shows that there always exists at least one point which satisfies the assumption on the existence of 𝐱𝒳 such that ft(𝐱¯t+1)-ft(𝐱)>2Δ that is along the feasible descent direction 𝐝t+1 from 𝐱¯t+1.

Next, adapting the proof of [33, Lemma 3.1] for the modified Armijo condition for online settings (7), it follows that there exists β¯ΔGD such that

ft(𝐱¯t+1+β¯𝐝t+1) ft(𝐱¯t+1)+β¯𝐠t𝐝t+1-β¯ϵ𝐝t+1
-2Δ.

The set 𝒮 is therefore non-empty if there exists 𝐱𝒳 such that ft(𝐱¯t+1)-ft(𝐱)>2Δ.

We now show the converse. Assuming 𝒮, then there exists β¯𝒮 and

ft(𝐱¯t+1+β¯𝐝t+1) <ft(𝐱¯t+1)-2Δ (F.8)

holds since 𝐠t𝐝t+1<0 by Lemma 2 and ϵ>0. Thus, (F.8) implies that there exists 𝐱𝒳 such that ft(𝐱¯t+1)-ft(𝐱)>2Δ and one of such point is 𝐱=𝐱¯t+1+β2𝐝t+1. This completes the proof.

G Proof of Corollary 2

Since β>0, then 𝒮. By the converse of Theorem 2, we have ft(𝐱¯t+1)-ft(𝐱)>2Δ, where 𝐱=𝐱¯t+1+β𝐝t+1, the decision played by the predictive update (5). The predictive update hence improves on the OCO update by at least 2Δ.

H Proof of Theorem 3

Let 𝕀𝒜t be the indicator function where 𝕀𝒜t=1 if at round t, β>0 and 𝐠t>ϵ or 0 otherwise. Using the same approach as in Theorem 1’s proof with Corollary 2, we obtain the regret bound. The last term of (8) is strictly positive and thus the POCOb regret is always bounded above by the OCO algorithm regret.

References

  • [1] D. P. Bertsekas. Nonlinear programming. Journal of the Operational Research Society, 48(3):334–334, 1997.
  • [2] D. P. Bertsekas. Convex optimization algorithms. Athena Scientific Belmont, 2015.
  • [3] F. Borrelli, A. Bemporad, and M. Morari. Predictive control for linear and hybrid systems. Cambridge University Press, 2017.
  • [4] D. S Callaway. Tapping the energy storage potential in electric loads to deliver load following and regulation, with application to wind energy. Energy Conversion and Management, 50(5):1389–1400, 2009.
  • [5] D.S. Callaway and I. A. Hiskens. Achieving controllability of electric loads. Proceedings of the IEEE, 99(1):184–199, 2011.
  • [6] X. Cao, J. Zhang, and H. V. Poor. A virtual-queue based algorithm for constrained online convex optimization with applications to data center resource allocation. IEEE Journal of Selected Topics in Signal Processing, 2018.
  • [7] T. Chen and G. B. Giannakis. Bandit convex optimization for scalable and dynamic iot management. IEEE Internet of Things Journal, 2018.
  • [8] T. Chen, Q. Ling, and G. B. Giannakis. An online convex optimization approach to proactive network resource allocation. IEEE Transactions on Signal Processing, 65(24):6350–6364, 2017.
  • [9] C.-K. Chiang, T. Yang, C.-J. Lee, M. Mahdavi, C.-J. Lu, R. Jin, and S. Zhu. Online optimization with gradual variations. In Conference on Learning Theory, pages 6–1, 2012.
  • [10] O. Dekel, A. Flajolet, N. Haghtalab, and P. Jaillet. Online learning with a hint. In Advances in Neural Information Processing Systems, pages 5305–5314, 2017.
  • [11] S. Diamond and S. Boyd. CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research, 17(83):1–5, 2016.
  • [12] A. Domahidi, E. Chu, and S. Boyd. ECOS: An SOCP solver for embedded systems. In European Control Conference (ECC), pages 3071–3076, 2013.
  • [13] J. C. Duchi, S. Shalev-Shwartz, Y. Singer, and A. Tewari. Composite objective mirror descent. In COLT, pages 14–26, 2010.
  • [14] C. E. Garcia, D. M. Prett, and M. Morari. Model predictive control: theory and practice—a survey. Automatica, 25(3):335–348, 1989.
  • [15] E. Hazan. Introduction to online convex optimization. Foundations and Trends® in Optimization, 2(3-4):157–325, 2016.
  • [16] E. Hazan, A. Agarwal, and S. Kale. Logarithmic regret algorithms for online convex optimization. Machine Learning, 69(2-3):169–192, 2007.
  • [17] E. Hazan and N. Megiddo. Online learning with prior knowledge. In International Conference on Computational Learning Theory, pages 499–513. Springer, 2007.
  • [18] N. Ho-Nguyen and F. Kılınç-Karzan. cAccelerating optimization under uncertainty via online convex optimization. Technical report, August, 2016.
  • [19] A. Jadbabaie, A. Rakhlin, S. Shahrampour, and K. Sridharan. Online optimization: Competing with dynamic comparators. In Artificial Intelligence and Statistics, pages 398–406, 2015.
  • [20] S.-J. Kim and G. B. Giannakis. An online convex optimization approach to real-time energy pricing for demand response. IEEE Transactions on Smart Grid, 8(6):2784–2793, 2017.
  • [21] A.e Lesage-Landry and J.A. Taylor. Setpoint tracking with partially observed loads. IEEE Transactions on Power Systems, 33(5):5615 – 5627, 2018.
  • [22] J. L. Mathieu, S. Koch, and D. S. Callaway. State estimation and control of electric loads to manage real-time energy imbalance. IEEE Transactions on Power Systems, 28(1):430–440, 2013.
  • [23] M. Mohri and S. Yang. Accelerating online convex optimization via adaptive prediction. In Artificial Intelligence and Statistics, pages 848–856, 2016.
  • [24] A. Mokhtari, S. Shahrampour, A. Jadbabaie, and A. Ribeiro. Online optimization in dynamic environments: Improved regret rates for strongly convex problems. In Decision and Control (CDC), 2016 IEEE 55th Conference on, pages 7195–7201. IEEE, 2016.
  • [25] Y. Nesterov. Introductory lectures on convex programming volume i: Basic course. Lecture notes, 1998.
  • [26] P. Palensky and D. Dietrich. Demand side management: Demand response, intelligent energy systems, and smart loads. IEEE transactions on industrial informatics, 7(3):381–388, 2011.
  • [27] A. Rakhlin and K. Sridharan. Online learning with predictable sequences. In Proceedings of the 26th Annual Conference on Learning Theory (COLT), pages 1–27, 2013.
  • [28] S. Rakhlin and K. Sridharan. Optimization, learning, and games with predictable sequences. In Advances in Neural Information Processing Systems, pages 3066–3074, 2013.
  • [29] M. Schmidt, N. L. Roux, and F. R. Bach. Convergence rates of inexact proximal-gradient methods for convex optimization. In Advances in neural information processing systems, pages 1458–1466, 2011.
  • [30] S. Shalev-Shwartz. Online learning and online convex optimization. Foundations and Trends® in Machine Learning, 4(2):107–194, 2012.
  • [31] J. Steinhardt and P. Liang. Adaptivity and optimism: An improved exponentiated gradient algorithm. In International Conference on Machine Learning, pages 1593–1601, 2014.
  • [32] J. A. Taylor, S. V. Dhople, and D. S. Callaway. Power systems without fuel. Renewable and Sustainable Energy Reviews, 57:1322–1336, 2016.
  • [33] S. Wright and J. Nocedal. Numerical optimization. Springer Science, 35(67-68), 1999.
  • [34] S. Yang and M. Mohri. Optimistic bandit convex optimization. In Advances in Neural Information Processing Systems, pages 2297–2305, 2016.
  • [35] M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), pages 928–936, 2003.