We incorporate future information in the form of the estimated value offuture gradients in online convex optimization. This is motivated by demandresponse in power systems, where forecasts about the current round, e.g., theweather or the loads' behavior, can be used to improve on predictions made withonly past observations. Specifically, we introduce an additional predictivestep that follows the standard online convex optimization step when certainconditions on the estimated gradient and descent direction are met. We showthat under these conditions and without any assumptions on the predictabilityof the environment, the predictive update strictly improves on the performanceof the standard update. We give two types of predictive update for variousfamily of loss functions. We provide a regret bound for each of our predictiveonline convex optimization algorithms. Finally, we apply our framework to anexample based on demand response which demonstrates its superior performance toa standard online convex optimization algorithm.
Quick Read (beta)
Predictive Online Convex Optimization\thanksreffootnoteinfo
We incorporate future information in the form of the estimated value of future gradients in online convex optimization. This is motivated by demand response in power systems, where forecasts about the current round, e.g., the weather or the loads’ behavior, can be used to improve on predictions made with only past observations. Specifically, we introduce an additional predictive step that follows the standard online convex optimization step when certain conditions on the estimated gradient and descent direction are met. We show that under these conditions and without any assumptions on the predictability of the environment, the predictive update strictly improves on the performance of the standard update. We give two types of predictive update for various family of loss functions. We provide a regret bound for each of our predictive online convex optimization algorithms. Finally, we apply our framework to an example based on demand response which demonstrates its superior performance to a standard online convex optimization algorithm.
ucb]Antoine Lesage-Landry, is]Iman Shames, ut]Joshua A. Taylor
Energy & Resources Group, University of California, Berkeley, USA
Department of Electrical and Electronic Engineering, University of Melbourne, Parkville, Australia
The Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada
Key words: Convex optimization; learning algorithms; machine learning; power systems; renewable energy systems; load dispatching
Online convex optimization (OCO) has found applications in fields like network resource allocation [8, 7, 6] and demand response in power systems [20, 21]. It is used for sequential decision-making when contextual information or feedback is only revealed to the decision maker at the end of the current round. Theoretical results showing that OCO algorithms have bounded regret guarantee the performance of these algorithms under mild assumptions.
In many applications, the decision maker has access to both revealed past information and estimates about future rounds. For example, in power systems, weather forecasts or historical load patterns can be used to estimate the future regulation needs [22, 4]. In this work, we present the predictive online convex optimization (POCO) framework. POCO works under the assumption that an estimate of the gradient of the loss function for the next round is available to the decision maker. In POCO, a standard OCO update is first applied using past information to compute the next decision. Then, the decision maker checks the quality of the estimated information available to them. If the estimated gradient is considered accurate enough, the decision maker implements an additional projected gradient step based on the estimated gradient to improve their decision for this round. This last step is referred as the predictive update.
We introduce explicit criteria for determining if the quality of the estimated gradient is high enough to guarantee an improvement over a standard OCO step when the predictive update is applied. A regret bound is obtained for all our algorithms. We conclude this work by presenting numerical examples where a POCO algorithm is used to improve on the performance of demand response with standard OCO. This example is motivated by the fact that a load aggregator often has access to an estimate of the power imbalance they have to counteract for regulation purposes.
Literature review. Recent work in online convex optimization has focused on including prior or future information. Reference , which builds on , assumes that the problem’s unknown and uncertain parameters follow a predictable process plus some noise  for their OCO algorithm. As in our setting, a second update with an estimated gradient-like term follows a mirror descent update. This second update is used by the algorithm in every step regardless of the quality of the estimated gradient. For this reason, the algorithm is referred to as optimistic. Optimistic algorithms were also studied in [31, 23, 34]. No conditions are provided about the estimated gradient in this case except that it comes from past observations and/or side information via an oracle. The authors of  show that the optimistic mirror descent can lead to a tighter bound than a standard online mirror descent algorithm if the process is indeed predictable. In , the authors provide a dynamic regret bound for the optimistic mirror descent. There is, however, no guarantee that in a given round the optimistic update does not do worse than the standard OCO update. An algorithm similar to  is given in . In their work, they make the stronger assumption in which the exact gradient of the next round loss function is available and then provide a static regret bound for their setting. This differs from our setting in that we provide dynamic regret-bounded algorithms and use an estimated gradient which entails less restrictive assumptions. Several other authors have studied different ways to incorporate future information in OCO like using state information  or the direction of the loss function’s gradient in an online linear optimization setting .
The projected gradient descent, inexact gradient descent, and proximal algorithms [1, 29, 2] from conventional convex optimization resemble our setting. These algorithms differ from ours because they aim to minimize the same objective function throughout all descent steps. In OCO, we minimize a sequence of objective functions and at each time provide a decision to minimize the current loss function. The loss function in a given round is only observed after we have committed to a decision. OCO will be introduced formally in Section 2.
Model predictive control (MPC) [14, 3] is another widely-used sequential decision-making framework. In MPC, the decision maker solves to optimality a receding horizon optimization problem that relies on models of future round loss functions. This thus requires significantly more contextual information and computational resources. These limitations are absent in OCO, making it a more suitable tool for real-time decision making with small computational resources.
Because we characterize conditions under which the predictive step improves performance, we guarantee improvement over conventional OCO and require no predictability assumptions. These conditions can be checked at each round of OCO, and if satisfied, the predictive update is implemented. In sum, in this work we make the following contributions:
We introduce a novel predictive online convex optimization framework and provide conditions for when to use side information.
We propose a predictive update with a predetermined step size for loss functions that have a Lipschitz gradient. We show that this update leads to a strict improvement over an OCO update when used (Section 4).
We give a predictive update with backtracking line search that applies to a broader family of problems. We show that it leads to strict improvement over an OCO update (Section 5).
We obtain sublinear regret bounds in the number of rounds for all algorithms.
We apply our framework to demand response in power systems and find that it outperforms a standard OCO algorithm (Section 6).
In OCO, one must make a decision at each round to minimize their cumulative loss [30, 15]. The current round’s loss function and any other round-dependent parameters are not available at the moment when the decision is made. Only information about previous rounds can be used to make the decision. Once the decision has been made, information about the current round is observed.
Let denote the current round index and be the time horizon. Let , , be the decision set, and let be the decision variable at time . We denote the differentiable convex loss function by for . Let be the Euclidean norm. We denote the projection operator onto the set as .
The goal of the decision maker is to sequentially solve the following sequence of problems:
for . The decision maker observes the loss function after choosing . For this reason, even if the loss function has a simple form, an analytical solution to the round optimization problem (2) is not obtainable. The decision is computed using a gradient descent-based , mirrored descent-based  or Newton step-based rule . For example, in the online gradient descent (OGD)  algorithm, the decision at round , , is given by the update:
where to guarantee a sublinear upper bound on the dynamic regret of OGD [35, Theorem 2].
The set is convex and compact.
The decision set represents all constraints on . In this version of OCO, we only consider time-invariant constraints.
The loss function is -bounded: for and .
The gradient of the loss function is -bounded: for and .
As a consequence of Assumption 1, the decision variable is also -bounded: for . We define the diameter of the compact set as and let , a positive scalar. The remainder of the assumptions will be stated when a specific technical result requires it.
where . The dynamic regret compares the loss suffered by the decision maker to optimal performance in each round. Other versions of the regret exists, e.g., static regret [30, 15, 35], which is defined in terms of the optimal stationary decision, in (2). In this work, we only consider the dynamic regret because it yields a stronger theoretical guarantee. This theoretical guarantee is also more relevant in the context of time-varying optimization. For this reason, we refer to the dynamic regret, , simply as the regret. Note that a bounded dynamic regret implies a bounded static regret . The goal when designing an OCO algorithm is to show that the regret is sublinearly bounded above in the number of rounds. An OCO algorithm with a sublinearly bounded dynamic regret in the number of rounds will on average perform as well as the round optimal decision at each round [30, 15, 16].
We conclude this section by defining the quantity . The term quantifies the variation of the optimal predictions through all rounds.
3 Predictive OCO
We now introduce our POCO framework. We let be the decision computed by an OCO algorithm in round . This OCO algorithm can be, for example, the aforementioned OGD. The decision is then given by the update (2). In POCO, we consider an -forecaster introduced in Assumption 4. Let be the estimated gradient of the loss function at .
Assumption 4 (-forecaster)
The -forecaster has access to an estimate of the gradient of the next round’s loss function evaluated at , , and the maximum estimation error, , such that for all time .
In other words, we consider a forecaster that has access to limited information about the next round in the form of . This could represent a prediction based on historical data, e.g., a weather or demand forecast. The decision maker uses this information to improve on the OCO update. For conciseness, we denote the estimated gradient by . We omit its dependency on because it is always evaluated at the OCO update output, , and no other points. The decision maker can meet Assumption 4 by relying on an exogenous model to estimate the gradient . In the context of demand response, historical data of the load’s consumption and generator output’s patterns, weather history and the historical values of the gradient, for example, can be used to build a statistical model to estimate the value of the at the decision given by OCO update. The parameter can then be set according to, for example, a high confidence interval or a worst-case performance parameter. The forecaster would then provide using this model.
Then, if certain conditions are met, the following update rule for our proposed POCO algorithm is used.
Definition 1 (Predictive update)
Let be an appropriately chosen step size. The predictive update is
The predictive update is to be used directly after the OCO update and will lead to a strict improvement over the OCO update under certain conditions. The aforementioned conditions will be discussed in the next sections and depend on the properties of the loss function. If the conditions are not met, is directly used. Let be the desired improvement when using the predictive update. We define the counter :
with . The variable represents the number of predictive updates as described in Definition 1. Let be the ratio of rounds using the predictive update to the total number of rounds.
Depending on the loss function, any regret-bounded OCO update can be used in the POCO framework. Back to the OGD example, the predictive OGD uses the update (2) and if certain conditions are met, is provided by (1) and if not, . We write as a function of the step size and let be the descent direction.
Next, we provide sufficient conditions for the estimated gradient to be a feasible descent direction. Later, we consider the step size selection problem. Particularly, two cases are considered where (i) the step sizes are constant and chosen a priori based on a property of the loss functions, or (ii) the step sizes are selected through the application of a backtracking line search that enforces a modified online version of the Armijo condition .
The following lemma introduces a sufficient condition for the estimated gradient to be a descent direction of the OCO problem (2).
Lemma 1 (Estimated descent direction)
The vector provided by the -forecaster is a descent direction for if .
Lemma 2 (Feasible estimated descent direction)
For all and , if and , then is a feasible descent direction at and
4 POCO with fixed step size
We now present a predictive update where step sizes are fixed and based on a propriety of the sequence of loss functions. We conclude this section by providing regret-bounded algorithms using these updates. In this section, we add the following assumption:
Let . The loss function has an -uniformly Lipschitz-continuous gradient: for all and .
We propose a predictive update with fixed step size next. We state sufficient conditions that guarantee a strict improvement over an OCO update. These sufficient conditions can be checked at each round to determine if the estimated information is accurate enough, and therefore if the predictive update should be used in the current round.
Lemma 3 (Predictive update with fixed step size)
The proof of Lemma 3 is provided in Appendix C. We now present regret bounds for POCO algorithms. This algorithm uses the predictive update with fixed step size to improve the performance of OCO algorithms.
Theorem 1 (POCO regret bound)
Consider an OCO algorithm with a sublinear regret upper bound. Suppose that the forecaster uses the predictive update (1) only at rounds when the estimated gradient and feasible descent direction satisfy the assumptions of Lemma 3. If the ratio of rounds satisfying these assumptions is greater than , then the regret of the POCO algorithm is bounded above by
Corollary 1 ( regret bound for POGD)
Suppose that the ratio of rounds that respects the assumptions of Lemma 3 is . Then the predictive OGD algorithm’s regret is bounded above by
which is sublinear and tighter than the OGD regret bound.
5 POCO with backtracking line search
In this section, we do not require Assumption 5 to hold. We however use the following proposition:
The loss function is -time-Lipschitz with , that is:
for all at all , and all .
Proposition 1 always holds because Assumption 2 implies that is sufficient. Under Proposition 1, we consider functions that are -locally and globally Lipschitz in their time argument, respectively, for the intermediary bound () and the upper bound (). This can represent, for example, loss functions like squared tracking error functions, in which the time-varying targets are always contained in a closed set.
Definition 2 (POCOb update)
Let be a positive scalar and be determined by a backtracking line search algorithm. The predictive update with backtracking line search is
algorithm[tb] \[email protected]@algorithmic \STATEParameters: Given and . \STATEInitialization: Set .
\STATE. \WHILE and \STATE.
The next lemma shows that the backtracking line search-based predictive update improves on the OCO update. Our claim relies on the modified Armijo condition for gradient projection. This condition ensures a sufficient decrease in the objective when using an estimated gradient projection descent direction . We adapt this condition to the estimated gradient and online setting. The modified Armijo condition for gradient projection  on and feasible descent direction for some with step size is given by:
Lemma 4 (Sufficient decrease of POCOb update)
The proof of the lemma can be found in Appendix E.
Note that there is an additional term in the modified Armijo condition for estimated gradient projection. This is a consequence of not having access to the exact gradient of . Hence, to ensure that the update is valid, the modified Armijo condition is augmented by a term proportional to the error of the estimated gradient. The second additional term, , is due to the time-varying setting of OCO.
We now discuss the existence of step sizes that satisfy (7) at round . Before stating the main result, for a given and , define the set of step sizes that comply with line 5 in the line search algorithm, which is the modified Armijo condition for online settings (7):
Suppose is a feasible descent direction and is bounded below for all . Then there exists such that if and only if .
The proof for Theorem 2 is presented in Appendix F. We note that Theorem 2 does not guarantee that the backtracking algorithm, Algorithm 2, will find a non-zero step size. Other techniques like exact line searches, might be required to identify an adequate step size in some problem instances. Using Theorem 2, we can provide a lower bound on the improvement of the predictive update with backtracking line search.
Corollary 2 (POCOb update improvement)
Suppose that the assumptions of Lemma 4 hold and , then the predictive update with backtracking line search improves on the OCO update by a minimum of .
Theorem 3 (POCOb regret bound)
Consider an OCO algorithm with bounded regret. Suppose that the assumptions of Lemma 4 are met. If the ratio of rounds with and satisfying these assumptions to is greater than , then the regret of the POCO algorithm with backtracking used by the -forecaster is bounded above by
and thus outperforms the OCO algorithm.
In this section, we apply POCO algorithms to demand response (DR) in power systems [5, 26], specifically regulation and curtailment. At each time step, a DR aggregator sends instructions to their loads to follow a regulation signal, e.g., a power imbalance due to a sudden change in renewable power generation [4, 32]. Each load responds to the signal by adjusting its power consumption. The power consumption is constrained by a storage capacity, which could represent physical storage like a battery or the load’s limits, e.g., thermal constraints. The regulation signal is unknown at the time the DR instructions are sent. This can be due, for example, to a drop in renewable power generation which is only assessed after the generator has committed to some amount of power. The objective of the DR aggregator is, therefore, to predict the DR dispatch at each time instance. This problem can be formulated as POCO, in which an estimate of the regulation signal is available to the load aggregator.
We consider loads. Let denote the decision variable at round . The variable represents the instructions sent to the loads. Let be the regulation signal at time . Let be the maximum and minimum power that can be consumed or delivered for all loads. Define the decision set . We let denote the state of charge vectors of the loads at time and the vector of load energy capacities. The state of charge of a load at time is . In the current case, we assume that there is no leakage nor energy losses. The OCO problem takes the following form:
The loss function has two terms: (i) a regulation term where the aggregated loads are dispatched to follow a regulation signal and (ii) a state of charge objective added to keep the loads near half their energy capacity. The loss function given in (9) is -strongly convex. For this reason we use the OGD for strongly convex functions (OGD) proposed in , which offers tighter regret bound than the standard OGD. The following corollary gives an upper bound on the regret of predictive OGD for strongly convex function (POGD).
Corollary 3 (POGD for strongly convex functions)
Fixed step size example.
The load and algorithm parameters for this example are gathered in Table 1. The initial state of charge of each load is set to half its capacity. The regulation signal is . The parameter is a Gaussian noise used to model sudden changes. We assume that the aggregator has access to estimated gradient for different level of accuracy . This represents, for example when , a relative error of at least of the actual gradient norm. The parameter is set to achieve adequate regulation performance without deviating too much from each load’s desired state of charge.
We now present the performance of our POCO algorithm with a fixed step size. We implement the OMD from  for comparison. This algorithm uses without validating the estimated information. Figure 1 shows an instance of the experimental regret for the POCO with three different values of , the conventional OCO algorithm, their respective regret bounds and OMD’s regret. POCO outperforms its bound, the OCO and the OMD algorithms. We remark that as expected the number of predictive updates increases with the accuracy of the estimated gradient, the performance of the POCO algorithm also improves.
Backtracking line search example.
We now present an example of POCO with backtracking. We consider a curtailment scenario. We let be the total power to be curtailed by the loads at time for . When a contingency occurs in the network, flexible loads are called to curtail their power consumption, e.g., by temporarily shutting down their HVAC system. Contrary to the regulation case, the loads are not contracted to follow a setpoint and no penalties are assessed on loads curtailing more than asked. Similar to the regulation setting, the curtailment signal is unknown until immediately after the current round. This setting can be modeled as POCO where an estimated curtailment signal is available to the aggregator at each round. We use the same notation as the previous examples. Let . This curtailment scenario is modeled by loss function given below:
where we have added a recovery coefficient to the state of charge objective term used previously. This coefficient models the usual evolution of the load (e.g., ambient temperature heating for a thermostatic load). We let . This is equivalent to a recovery coefficient of per hour. The function given in (10) is not gradient Lipschitz and Assumption 5 does not hold. We model the curtailment signal to be quickly increasing at first and then slowly plateauing to represent new level of available generation. This event is assumed to be limited in time, after which the network goes back to its normal state and no curtailment is then required. We let where for and then where for . The noise variance is equivalent to approximatively of curtailment signal’s value at first and then about .
We use the same parameters as in the previous section, except for those in Table 2. The POCOb experimental regret shown in Figure 2 is sublinear in the numbers of rounds and outperforms the OCO’s regret. While the performance is not as strong as POCO with fixed step size, this algorithm can be applied to a broader family of functions because it does not require the loss function to be gradient Lipschitz continuous. The difference in performances between the two POCO updates is explained by the fact that the sufficient conditions for the POCOb are rarely satisfied in this simulation. Improved estimation accuracy, , and a loss function with a lower maximum temporal change, , could increase the number of times the backtracking line search is used. Nevertheless, the POCOb achieves a regret reduction of 29% when over a standard OCO algorithm. Lastly, we note that POCOb performs better for larger variations in and smaller values of .
In this work, we have presented the predictive online convex optimization framework. In POCO, a second update is used after the OCO update to improve performance using an estimated gradient. We have presented two versions of the predictive update that can be used under different assumptions. We have shown a regret upper bound for all of our POCO algorithms. We have applied POCO to demand response in electric power systems and found that they outperform conventional OCO using commonly available forecast information. In the case of fixed step size update, we observed an improvement of in the final regret and of in the backtracking case when having access to a -forecaster.
A.L.-L. thanks P. Mancarella for his support and for co-hosting him at The University of Melbourne.
A Proof of Lemma 1
Define as where by the definition of the -forecaster. A vector is a descent direction if . Thus, we have
Equivalently, we have . Taking the norm of both sides and dividing by the norm of gives
where is the angle between and . By assumption, and . Therefore (A.1) always holds and we have proved the lemma.
B Proof of Lemma 2
C Proof of Lemma 3
for all . We substitute and into (C.1) to obtain
For the reminder of the proof, we use to simplify the notation. We rewrite the gradient in term of the estimated gradient, which yields
By assumption, , which ensures that Lemma 2 holds. We use Lemma 2 to upper bound the second term of the right-hand side of (C.2). We then have: . Therefore, the predictive update with fixed step size will improve on the OCO update by a minimum of if the following condition is satisfied:
Assuming , then , and if , then (C.3) also holds for any . Solving for the norm of the feasible descent direction , we have
Thus, by setting and satisfying (C.4), we obtain where . This implies that the predictive update strictly improves over the OCO update when the feasible descent direction satisfies the condition (C.4). The improvement is bounded below by .
D Proof of Theorem 1
Let denote the decision variable with for all . In other words, represents the decision variable computed without the predictive algorithm. Denote the set of assumptions of Lemma 3 at round by . Let be the indicator function where if the assumptions are satisfied and otherwise. Observe that the improvement, , is given by
where is the improvement when . The regret of the POCO algorithm is
E Proof of Lemma 4
We show that for some step size , the estimated gradient descent projection leads to a sufficient decrease thus outperforming the OCO update. We show that if satisfies (7), then it also satisfies satisfies (6), ensuring a sufficient decrease in the objective function. Note that (7) is the condition under which the backtracking algorithm, Algorithm 2, is used. We can see from the left-hand side of the condition (7) that the update improves over the OCO update because the three last terms are bounded above by , i.e., by Lemma 2. Thus all three terms are less or equal to zero. By assumption, , and these terms are also bounded away from zero since .
Rearranging the terms, we have
By assumption, is time-Lipschitz with constant for all and all . We can therefore bound below and above respectively the left-hand and right-hand side of (E.1). This leads to
the modified Armijo condition (6).
F Proof of Theorem 2
Assume . This assumption implies that by Proposition 1. Thus, is not the minimum point of . It follows that . By assumption, is a feasible descent direction and we have
Let . Subtracting on both side of (F.1) we obtain,
If the following condition holds, then (F.2) also holds:
Under Assumption 3, for all we have
and by Assumption 1, we have . Then, if
Recalling Taylor’s Theorem [33, Theorem 2.1]:
where , and for some . We let and . We have,
By setting in (F.7), we have . This shows that there always exists at least one point which satisfies the assumption on the existence of such that that is along the feasible descent direction from .
The set is therefore non-empty if there exists such that .
G Proof of Corollary 2
H Proof of Theorem 3
-  D. P. Bertsekas. Nonlinear programming. Journal of the Operational Research Society, 48(3):334–334, 1997.
-  D. P. Bertsekas. Convex optimization algorithms. Athena Scientific Belmont, 2015.
-  F. Borrelli, A. Bemporad, and M. Morari. Predictive control for linear and hybrid systems. Cambridge University Press, 2017.
-  D. S Callaway. Tapping the energy storage potential in electric loads to deliver load following and regulation, with application to wind energy. Energy Conversion and Management, 50(5):1389–1400, 2009.
-  D.S. Callaway and I. A. Hiskens. Achieving controllability of electric loads. Proceedings of the IEEE, 99(1):184–199, 2011.
-  X. Cao, J. Zhang, and H. V. Poor. A virtual-queue based algorithm for constrained online convex optimization with applications to data center resource allocation. IEEE Journal of Selected Topics in Signal Processing, 2018.
-  T. Chen and G. B. Giannakis. Bandit convex optimization for scalable and dynamic iot management. IEEE Internet of Things Journal, 2018.
-  T. Chen, Q. Ling, and G. B. Giannakis. An online convex optimization approach to proactive network resource allocation. IEEE Transactions on Signal Processing, 65(24):6350–6364, 2017.
-  C.-K. Chiang, T. Yang, C.-J. Lee, M. Mahdavi, C.-J. Lu, R. Jin, and S. Zhu. Online optimization with gradual variations. In Conference on Learning Theory, pages 6–1, 2012.
-  O. Dekel, A. Flajolet, N. Haghtalab, and P. Jaillet. Online learning with a hint. In Advances in Neural Information Processing Systems, pages 5305–5314, 2017.
-  S. Diamond and S. Boyd. CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research, 17(83):1–5, 2016.
-  A. Domahidi, E. Chu, and S. Boyd. ECOS: An SOCP solver for embedded systems. In European Control Conference (ECC), pages 3071–3076, 2013.
-  J. C. Duchi, S. Shalev-Shwartz, Y. Singer, and A. Tewari. Composite objective mirror descent. In COLT, pages 14–26, 2010.
-  C. E. Garcia, D. M. Prett, and M. Morari. Model predictive control: theory and practice—a survey. Automatica, 25(3):335–348, 1989.
-  E. Hazan. Introduction to online convex optimization. Foundations and Trends® in Optimization, 2(3-4):157–325, 2016.
-  E. Hazan, A. Agarwal, and S. Kale. Logarithmic regret algorithms for online convex optimization. Machine Learning, 69(2-3):169–192, 2007.
-  E. Hazan and N. Megiddo. Online learning with prior knowledge. In International Conference on Computational Learning Theory, pages 499–513. Springer, 2007.
-  N. Ho-Nguyen and F. Kılınç-Karzan. Accelerating optimization under uncertainty via online convex optimization. Technical report, August, 2016.
-  A. Jadbabaie, A. Rakhlin, S. Shahrampour, and K. Sridharan. Online optimization: Competing with dynamic comparators. In Artificial Intelligence and Statistics, pages 398–406, 2015.
-  S.-J. Kim and G. B. Giannakis. An online convex optimization approach to real-time energy pricing for demand response. IEEE Transactions on Smart Grid, 8(6):2784–2793, 2017.
-  A.e Lesage-Landry and J.A. Taylor. Setpoint tracking with partially observed loads. IEEE Transactions on Power Systems, 33(5):5615 – 5627, 2018.
-  J. L. Mathieu, S. Koch, and D. S. Callaway. State estimation and control of electric loads to manage real-time energy imbalance. IEEE Transactions on Power Systems, 28(1):430–440, 2013.
-  M. Mohri and S. Yang. Accelerating online convex optimization via adaptive prediction. In Artificial Intelligence and Statistics, pages 848–856, 2016.
-  A. Mokhtari, S. Shahrampour, A. Jadbabaie, and A. Ribeiro. Online optimization in dynamic environments: Improved regret rates for strongly convex problems. In Decision and Control (CDC), 2016 IEEE 55th Conference on, pages 7195–7201. IEEE, 2016.
-  Y. Nesterov. Introductory lectures on convex programming volume i: Basic course. Lecture notes, 1998.
-  P. Palensky and D. Dietrich. Demand side management: Demand response, intelligent energy systems, and smart loads. IEEE transactions on industrial informatics, 7(3):381–388, 2011.
-  A. Rakhlin and K. Sridharan. Online learning with predictable sequences. In Proceedings of the 26th Annual Conference on Learning Theory (COLT), pages 1–27, 2013.
-  S. Rakhlin and K. Sridharan. Optimization, learning, and games with predictable sequences. In Advances in Neural Information Processing Systems, pages 3066–3074, 2013.
-  M. Schmidt, N. L. Roux, and F. R. Bach. Convergence rates of inexact proximal-gradient methods for convex optimization. In Advances in neural information processing systems, pages 1458–1466, 2011.
-  S. Shalev-Shwartz. Online learning and online convex optimization. Foundations and Trends® in Machine Learning, 4(2):107–194, 2012.
-  J. Steinhardt and P. Liang. Adaptivity and optimism: An improved exponentiated gradient algorithm. In International Conference on Machine Learning, pages 1593–1601, 2014.
-  J. A. Taylor, S. V. Dhople, and D. S. Callaway. Power systems without fuel. Renewable and Sustainable Energy Reviews, 57:1322–1336, 2016.
-  S. Wright and J. Nocedal. Numerical optimization. Springer Science, 35(67-68), 1999.
-  S. Yang and M. Mohri. Optimistic bandit convex optimization. In Advances in Neural Information Processing Systems, pages 2297–2305, 2016.
-  M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), pages 928–936, 2003.