Abstract
We propose a novel data-driven method to accelerate the convergence ofAlternating Direction Method of Multipliers (ADMM) for solving distributed DCoptimal power flow (DC-OPF) where lines are shared between independent networkpartitions. Using previous observations of ADMM trajectories for a given systemunder varying load, the method trains a recurrent neural network (RNN) topredict the converged values of dual and consensus variables. Given a newrealization of system load, a small number of initial ADMM iterations is takenas input to infer the converged values and directly inject them into theiteration. We empirically demonstrate that the online injection of these valuesinto the ADMM iteration accelerates convergence by a significant factor forpartitioned 14-, 118- and 2848-bus test systems under differing load scenarios.The proposed method has several advantages: it maintains the security ofprivate decision variables inherent in consensus ADMM; inference is fast and somay be used in online settings; RNN-generated predictions can dramaticallyimprove time to convergence but, by construction, can never result ininfeasible ADMM subproblems; it can be easily integrated into existing softwareimplementations. While we focus on the ADMM formulation of distributed DC-OPFin this paper, the ideas presented are naturally extended to other distributedoptimization problems.
Quick Read (beta)
Learning-Accelerated ADMM for Distributed DC Optimal Power Flow ^{†}^{†}thanks: This work was authored by the National Renewable Energy Laboratory, operated by Alliance for Sustainable Energy, LLC, for the U.S. Department of Energy (DOE) under Contract No. DE-AC36-08GO28308. Funding was provided by the Autonomous Energy Systems project funded by the National Renewable Energy Laboratory’s Laboratory Directed Research and Development program. The views expressed in the article do not necessarily represent the views of the DOE or the U.S. Government. The U.S. Government retains and the publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this work, or allow others to do so, for U.S. Government purposes.
Abstract
We propose a novel data-driven method to accelerate the convergence of Alternating Direction Method of Multipliers (ADMM) for solving distributed DC optimal power flow (DC-OPF) where lines are shared between independent network partitions. Using previous observations of ADMM trajectories for a given system under varying load, the method trains a recurrent neural network (RNN) to predict the converged values of dual and consensus variables. Given a new realization of system load, a small number of initial ADMM iterations is taken as input to infer the converged values and directly inject them into the iteration. We empirically demonstrate that the online injection of these values into the ADMM iteration accelerates convergence by a significant factor for partitioned 14-, 118- and 2848-bus test systems under differing load scenarios. The proposed method has several advantages: it maintains the security of private decision variables inherent in consensus ADMM; inference is fast and so may be used in online settings; RNN-generated predictions can dramatically improve time to convergence but, by construction, can never result in infeasible ADMM subproblems; it can be easily integrated into existing software implementations. While we focus on the ADMM formulation of distributed DC-OPF in this paper, the ideas presented are naturally extended to other distributed optimization problems.
DC optimal power flow, recurrent neural network, alternating direction method of multipliers, machine learning, data-driven optimization
I Introduction
The electric power grid is continually progressing towards a more complex, uncertain, and decentralized state. This fact stems from a variety of sources including higher penetration of renewable generation, increased presence of smart devices and subsystems, and market deregulation, to name a few. While this progression presents a number of operational and analytical challenges, the corresponding increase in available data paves the way for new approaches to improving system-wide efficiency and coordination.
In this paper, we consider the specific operational problem of direct current optimal power flow (DC-OPF) in which the network is decomposed into independently operating partitions. Because each partition is connected to others via one or more branches, agreement on these line flows is required as part of an optimal solution. While there are many possible ways to solve this consensus problem in a distributed fashion, we focus on the Alternating Direction Method of Multipliers (ADMM) due to its recent popularity for solving such problems [1, 2, 3, 4]. We note, however, that the ideas proposed in this paper are readily extensible to other iterative solution techniques.
In the spirit of several recent analyses of ADMM [5, 6], we view the iterations as the time steps of a discrete, stable dynamical system whose equilibria correspond to an optimal solution of the underlying optimization problem. The proposed method, which we call learning-accelerated ADMM (LA-ADMM), aims to leverage previously observed trajectories as training data to build a machine learning (ML) model to predict the equilibrium values using only a small number of initial iterates. Once such a model is trained, optimal values can be inferred online and injected directly into the iteration to accelerate convergence to a feasible and optimal solution. During prediction, we use a recurrent neural network (RNN) based on the Gated Recurrent Unit (GRU) [7] and argue that this is a sound choice based on the connection between ADMM and numerical schemes for solving a particular dynamical system [5, 6].
The use of ML in power systems is a growing area of research. Many straightforward applications of ML in OPF and related problems are either centralized in the sense that training requires complete knowledge of the system’s state and dynamics [8, 9, 10, 11], or intrusive in the sense that target models are created de novo or to replace an existing one; see, e.g., [12, 13, 14, 15, 16] for recent examples. Our approach requires neither centralization nor intrusiveness. In contrast to many works using ML for OPF, we use ML not to obtain the optimal solutions of the OPF itself; rather, we use ML to accelerate the process of solving the OPF. Importantly, this avoids infeasibility of the final solution, unlike ML+OPF methods which have no feasibility guarantees or require a feasibility post-processing step to ensure power flow constraints hold [8, 17, 18]. While the proposed method can leverage centralized models to accelerate the training process, it only requires access to the consensus variables for training and inference. Because predicted values are fed into the optimization models as iteration parameters, the method is distinctly non-intrusive.
In addition to the interest in ML increasing in the power systems community, interest in distributed optimization algorithms is also on the rise. This is partially due to the increasing prevalence of physically distributed, autonomous systems and the frequent intractability of centralized formulations that assume complete knowledge of the system’s state and dynamics. The optimal power flow problem, in particular, is amenable to the use of distributed methods as evidenced by several recent reviews [19, 20] and applications utilizing ADMM [21, 22, 23, 24, 25, 26, 27]. This paper combines ML with ADMM to leverage both the distributed nature of ADMM and the decrease in convergence time from using ML to estimate the dual and consensus variables.
The paper is organized as follows. In section II, we briefly summarize the components of the proposed method, namely, the DC-OPF problem, its distributed formulation via ADMM, and the RNN used for the prediction task. In section III we describe empirical experiments for three test systems and present the results of using standard ADMM versus LA-ADMM. Section IV discusses further issues, conclusions and future work.
II Methodology
II-A DC Optimal Power Flow Formulation
The aim of the DC-OPF is to determine the least-cost dispatch of generation that satisfies the load demand subject to power flow limits on the network lines. Using a DC approximation for the AC power flow equations leads to the following optimization problem:
$\underset{\mathbf{g},\bm{\theta}}{min}$ | $f(\mathbf{g})$ | (1) | ||
$\mathrm{s}.\mathrm{t}.$ | $\mathbf{H}\bm{\theta}+\mathbf{g}-\mathbf{d}=\mathrm{\U0001d7ce},$ | (2) | ||
$\underset{\xaf}{\mathbf{g}}\le \mathbf{g}\le \overline{\mathbf{g}},\underset{\xaf}{\mathbf{f}}\le \mathbf{K}\bm{\theta}\le \overline{\mathbf{f}},{\theta}_{0}=0.$ | (3) |
Here $\mathbf{g},\mathbf{d}\in {\mathbb{R}}^{n}$ are the active power generation and load at each of $n$ buses, respectively, and $\bm{\theta}\in {\mathbb{R}}^{n}$ collects the phase angles at each of $n$ buses, with ${\theta}_{0}$ denoting the angle at the reference bus. The matrix $\mathbf{K}:=\mathrm{\mathbf{B}\mathbf{A}}$, where $A\in {\mathbb{R}}^{m\times n}$ is a network directed graph incidence matrix, where $m$ is the number of lines in the network, and the diagonal matrix $\mathbf{B}\in {\mathbb{R}}^{m\times m}$ collects the line susceptances on the diagonal. Also, the matrix $\mathbf{H}\in {\mathbb{R}}^{n\times n}$ is defined as $\mathbf{H}:={\mathbf{A}}^{T}\mathrm{\mathbf{B}\mathbf{A}}$. Upper and lower bounds on $\mathbf{g}$ are denoted by $\overline{\mathbf{g}},\underset{\xaf}{\mathbf{g}}$, respectively. Similarly, upper and lower line flow limits are denoted by $\overline{\mathbf{f}}$ and $\underset{\xaf}{\mathbf{f}}$, respectively. The vector of all zeros, $\mathrm{\U0001d7ce}$, has dimension determined by context (here, $\mathrm{\U0001d7ce}\in {\mathbb{R}}^{n}$). The cost function $f:{\mathbb{R}}^{n}\to \mathbb{R}$ representing the cost of generation is assumed to be linear, i.e., $f(\mathbf{g})={\mathbf{c}}^{T}\mathbf{g}$ with non-negative cost vector, $\mathbf{c}\ge \mathrm{\U0001d7ce}$.
II-B Distributed DC-OPF Formulation via ADMM
In the distributed DC-OPF problem, the network is partitioned physically (i.e., into balancing areas) and/or computationally into subproblems that may share some number of the optimization variables. In this paper, we specifically consider the partitioning of buses into $S$ disjoint sets indexed by ${\mathcal{I}}_{s}$, such that ${\cup}_{s=1}^{S}{\mathcal{I}}_{s}=\{1,\mathrm{\dots},n\}$. In other words, a bus $i$ belongs to partition $s$ if and only if $i\in {\mathcal{I}}_{s}$. Such partitioning naturally leads to bus classification into buses that are connected only to buses inside the same partition, and buses that are connected to buses in other partitions. We denote the set of buses inside partition $u$ that are connected to buses in partition $s$ by ${\mathcal{J}}_{su}$, where $s,u$ refer to adjacent partitions. We use the terminology of public and private to distinguish between decision variables that are shared between partitions and those that are internal to a partition, respectively. Using the above notation, we note that public variables consist of phase angles at buses that are connected to buses in other partitions. Let the number of buses that are connected to buses outside their partitions be ${n}_{\text{pub}}$. In contrast, all generation decision variables and phase angles of buses that are only connected to buses inside the same partition are private decision variables. Thus, the number of these private variables is ${n}_{\text{pri}}=2n-{n}_{\text{pub}}$, where $2n$ corresponds to both bus voltage and generator setpoints at each private bus. Finally, we use subscript $s$ to denote the partition membership of decisions, and use ${\bm{\theta}}_{su}$ to denote the phase angles of buses in partition $u$ that are connected to buses in partition $s$.
The global DC-OPF problem (1)-(3) can be expressed in terms of the partitions by
$\underset{\mathbf{g},\bm{\theta}}{min}$ | $\sum _{s=1}^{S}}{f}_{s}({\mathbf{g}}_{s})$ | (4) | |||
$\mathrm{s}.\mathrm{t}.$ | ${\mathbf{H}}_{s}{\bm{\theta}}_{s}+{\displaystyle \sum _{u}}{\mathbf{H}}_{su}{\bm{\theta}}_{su}+{\mathbf{g}}_{s}-{\mathbf{d}}_{s}={\mathrm{\U0001d7ce}}_{s},$ | $\forall s$ | (5) | ||
${\underset{\xaf}{\mathbf{g}}}_{s}\le {\mathbf{g}}_{s}\le {\overline{\mathbf{g}}}_{s},$ | $\forall s$ | (6) | |||
${\underset{\xaf}{\mathbf{f}}}_{s}\le {\mathbf{K}}_{s}{\bm{\theta}}_{s}+{\displaystyle \sum _{u}}{\mathbf{K}}_{su}{\bm{\theta}}_{su}\le {\overline{\mathbf{f}}}_{s},$ | $\forall s$ | (7) | |||
${\bm{\theta}}_{su}={\mathbf{E}}_{su}{\bm{\theta}}_{u}$ | $\forall s,u$ | (8) |
where ${\mathbf{H}}_{su}$ denotes the sub-block of the matrix $\mathbf{H}$ that selects the rows ${\mathcal{I}}_{s}$ and columns ${\mathcal{J}}_{su}$. In other words, the constraint enforces power balance on all internal nodes subject to both internal and shared decision variables. The cost function is also partitioned over the network partitions where ${f}_{s}$ is function of only the generation at generation inside the $s$ partition. The matrix $\mathbf{K}$ is also partitioned such that for each partition the flow limits are respected for all lines connected to at least one bus in the partition. The matrices ${\mathbf{E}}_{su}$ is a selection matrix that selects the phase angles of buses in ${\mathcal{J}}_{su}$.
Were it not for the consensus constraints (8), the above problem (4)-(7) could be trivially decomposed into $S$ disjoint problems whose independent solution yields the global minimum. In the presence of these constraints, however, special handling is required to enforce these constraints. ADMM accomplishes this task via an augmented Lagrangian formulation (see the classic reference [2] for more details) that drives local copies of public variables into consensus with the partitions that share them. Algorithmically, consensus ADMM equates to the following iterative scheme consisting of $S$ independent primal optimizations followed by a centralized dual update,
$\mathrm{\{}{\mathbf{g}}_{s}^{(k+1)},{\bm{\theta}}_{s}^{(k+1)},{\bm{\theta}}_{su}^{(k+1)}\}=argmin{}_{{\mathbf{g}}_{s},{\bm{\theta}}_{s},{\bm{\theta}}_{su}\in {\mathcal{C}}_{s}}f{}_{s}({\mathbf{g}}_{s})$ | |||
$\mathrm{\hspace{1em}}+{\displaystyle \sum _{u}}\bm{\lambda}_{su}^{(k)}{}^{T}\left({\bm{\theta}}_{su}-{\overline{\bm{\theta}}}_{su}^{(k)}\right)+{\displaystyle \frac{\rho}{2}}{||{\bm{\theta}}_{su}-{\overline{\bm{\theta}}}_{su}^{(k)}||}_{2}^{2}$ | (9) | ||
${\overline{\bm{\theta}}}_{su}^{(k+1)}={\displaystyle \frac{1}{2}}\left({\bm{\theta}}_{su}^{(k+1)}+{\mathbf{E}}_{su}{\bm{\theta}}_{u}^{(k+1)}\right)$ | (10) | ||
${\bm{\lambda}}_{su}^{(k+1)}={\bm{\lambda}}_{su}^{(k)}+\rho \left({\bm{\theta}}_{su}^{(k+1)}-{\overline{\bm{\theta}}}_{su}^{(k+1)}\right).$ | (11) |
where we have introduced Lagrange multipliers ${\bm{\lambda}}_{su}\in {\mathbb{R}}^{|{\mathcal{J}}_{su}|}$ and used ${\mathcal{C}}_{s}$ to denote the local constraint set for partition $s$, i.e., constraints (5)-(7).
II-C Recurrent Neural Networks
As alluded to in Section I, one way to view the ADMM iteration is as a numerical integration scheme for solving the dynamical system, sometimes referred to as gradient flow,
$\frac{d}{dt}}u(t)=-\nabla f(u(t)).$ | (12) |
Note that steady states of this system coincide with local optima of $f$ since $du/dt=0\iff \nabla f=0$. This point of view has proven useful in several recent works analyzing the convergence and designing accelerated variants of ADMM [5, 6]. It is also empirically intuitive; see, e.g., Fig. 1 showing convergence of a subset of ADMM variables for the 14-bus system described in Section III.
Network | Buses | Branches | Partitions | ${n}_{pub}$ | $|{x}_{su}|$ | $\rho $ |
---|---|---|---|---|---|---|
IEEE 14 | 14 | 20 | 2 | 6 | 24 | 1 |
IEEE 118 | 118 | 186 | 4 | 22 | 88 | 100 |
RTE 2848 | 2848 | 3776 | 18 | 257 | 1028 | 100 |
In selecting a machine learning algorithm, we adopted the dynamical system interpretation of ADMM and only considered algorithms suitable for predicting sequential data. While many choices exist for this task, we selected the GRU [7] because it is a well established architecture in deep learning, is relatively lightweight in terms of number of parameters and, as a result, is extremely easy implement and train using modern open source software. While not formally reported in this paper, preliminary experiments also suggested that an RNN architecture was superior for this prediction task when compared with other off-the-shelf ML algorithms.
II-D LA-ADMM Algorithm
The LA-ADMM algorithm is a modification of standard ADMM in which the iteration is interrupted once to inject predicted values for the converged dual and consensus variables. We assume that an RNN has been previously trained using $K$ steps of ADMM for which the optimal solution is known, either by gathering data from fully convergent ADMM iterations or, where applicable, by solving a centralized version of the problem. Thus, the RNN takes as input the values ${\bm{\lambda}}_{su}^{(k)},{\overline{\bm{\theta}}}_{su}^{(k)}$ with $k=1,\mathrm{\dots},K$ and generates predictions for the converged values which we denote by ${\bm{\lambda}}_{su}^{*},{\overline{\bm{\theta}}}_{su}^{*}$. These predicted values overwrite the current iteration variables and become the de facto parameters for partition subproblems. Pseudocode for LA-ADMM is given in Algorithm 1.
A natural question is whether it would be beneficial to invoke the RNN prediction every $K$ steps rather than just once. While this seems like a reasonable idea, our experiments suggest that such a method suffers from accumulating prediction errors since, for all but the first prediction step, inputs to the RNN have been perturbed by previous prediction steps. Empirically we observed that this led to poor outcomes for the algorithmic settings we considered and so did not investigate it further.
III Experiments and Results
III-A Numerical experiments
III-A1 Test systems
The three test systems considered were the IEEE 14-bus, IEEE 118-bus, and RTE 2848-bus systems. Data for all three systems were obtained via the Power Grid Lib (pglib) repository [28]. Network properties were obtained directly from the repository, while load data was used to seed the training, as described in the following subsection. The partitioning for the 118-bus system was chosen in accordance with the decomposition in [29], which was designed to achieve a reasonable rate of convergence for Lagrangian-based decomposition methods. The RTE 2848-bus network was partitioned into $18$ partitions using spectral factorization of the network graph Laplacian [30]. Network properties for the partitioned systems are summarized in Table I.
III-A2 Simulation of load
Load for all three systems was sampled such that total load remained less than total generation capacity, but was seeded according to the system pglib load data. In particular, we first defined a characteristic load for each bus, ${\overline{d}}_{i}$, equal to the reported pglib value. We then used a global scaling factor, $\chi $, and bus-level scaling factor, ${\xi}_{i}$, to generate the load scenario for each bus via
${\stackrel{~}{d}}_{i}(\chi ,{\xi}_{i})$ | $=\chi (1+{\xi}_{i}){\overline{d}}_{i},\chi ,{\xi}_{i}\in U(0,1),\forall i.$ | (13) |
The random variables ${\xi}_{i}\in U(0,1)$ were sampled first, then $\chi $ was sampled uniformly from an interval that was small enough to ensure that the total load did not exceed generation.
III-A3 ADMM configuration
For all experiments we initialized the ADMM values ${\bm{\lambda}}_{su}^{(0)}$ and ${\bm{\theta}}_{su}^{(0)}$ to zero. The network partitions were used to define public buses and corresponding primal and dual variables whose cardinality is given in Table I. While a reasonably effective value of $\rho =100$ was chosen for the 118- and 2848-bus systems, we intentionally chose a value of $\rho =1$ for the 14-bus system that led to slow ADMM convergence. This choice enabled us to demonstrate the efficacy of LA-ADMM even for slowly converging iterations.
III-A4 Training and Evaluation
To generate RNN training data, we sampled instances of system load and solved the resulting DC-OPF problem to obtain converged values of the ADMM objective function as well as primal and dual variables. While we envision optimal values coming from converged ADMM iterations in any real-world scenario, it is also possible to accelerate training on simulated systems by running ADMM for a small number of iterations to gather what will be the input for the RNN and then using a centralized solution to extract prediction targets (i.e. converged Lagrange multipliers and consensus variables). We used the latter approach, running ADMM for no more than 10 iterations to obtain training inputs and using the centralized solution for prediction targets. Training set sizes were 1000, 4000, and 40,000 for the 14-, 118-, and 2848-bus systems, respectively.
Best practices for model selection include the use of cross validation, hyperparameter grid search, and regularization, to name a few. After extensive experimentation along these training dimensions, we adopted an approach that favored simplicity over accuracy with respect to a held-out test set. In particular, we identified a global set of hyperparameters and applied them to the RNN trained for each test system. The network architecture, illustrated in Fig. 7, consisted of a two-headed input layer corresponding to samples of the form ${({\bm{\lambda}}_{su}^{(k),\mathrm{\ell}},{\bm{\theta}}_{su}^{(k),\mathrm{\ell}})}_{\mathrm{\ell}=1}^{L}$ for $k=1,\mathrm{\dots},K$ initial ADMM steps from $\mathrm{\ell}=1,\mathrm{\dots},L$ trials. Inputs were concatenated, then passed through a GRU layer with 128 hidden units, a dense layer of 64 units, and a final two-headed output layer corresponding to targets ${({\bm{\lambda}}_{su}^{*,\mathrm{\ell}},{\bm{\theta}}_{su}^{*,\mathrm{\ell}})}_{\mathrm{\ell}=1}^{L}$. All layers except the linear output layer used ReLU activations, and an ${L}_{2}$-regularization penalty with coefficient ${10}^{-4}$ was applied to all weights. We observed that input sequences of length $K=4$ led to a good balance of data efficiency and prediction accuracy and used this value for all experiments. Backpropagation was performed with respect to mean squared error on a held-out validation set using the stochastic gradient descent via the Adam optimizer with a learning rate of ${10}^{-3}$ and stopped after 5 iterations of stagnation in the validation loss or after 50 epochs, whichever came first.
To evaluate performance, new realizations of load were generated using the procedure summarized by (13) and ADMM was run twice per sample: once, uninterrupted, for $100$ iterations to provide a pure ADMM baseline, and a second time with RNN predictions injected at step $k=4$, after which the iteration continues using the standard updates (9)-(11).
III-B Results
The histograms in Fig. 2 summarize convergence of ADMM and LA-ADMM with respect to the known optimal value of the objective function by binning the relative error in this quantity at the end of the iteration. We see that for the 14- and 118- bus test cases, LA-ADMM almost universally improves convergence by around two orders of magnitude over standard ADMM. The results for the 2848-bus test case are less striking, although even here we observe acceleration for the slowest converging samples. Fig. 3 illustrates the effect of injecting predicted values on the mean residual error over all test samples. Here we observe 1-2 orders of magnitude reduction in the residual on average but that this comes at the cost of higher variance. Taken together, however, Fig. 2-3 suggest that this is a compelling, data-driven approach to accelerating ADMM.
Fig. 4-6 provide a more detailed view of the convergence of ADMM and LA-ADMM for individual test samples. Fig. 4-5 suggest that the acceleration seen in Fig. 2-3 arise from LA-ADMM’s ability to push the iteration very close to the optimal solution using predicted values from the RNN. Analogously, Fig. 6 shows that LA-ADMM predictions are not as effective in this case. We speculate that the higher complexity of the modeled system and high dimensionality of the learning problem contribute to LA-ADMM being less effective in this case, but that the model could be improved by increasing the size of the training set and RNN model capacity. We hope to explore these ideas in future work.
IV Discussion, Conclusion and Future Work
While not necessary apparent today, the use of distributed optimization techniques such as ADMM are likely to be critical in future power systems operation. However, such decomposition comes with the drawback of the need to “iterate to self-consistency”. Our approach avoids this apparent jam by “pre-optimizing”; while there really is no free lunch, so we must do the work somewhere, by using computational cycles in an off-line learning mode we allow the online optimization for general inputs to be drastically sped up.
Generalizability: One may wonder not only whether our method generalizes well to all possible load vectors $\mathbf{d}$ (our numerical tests suggest that it does) but in what other ways it can generalize. For example, one can imagine both the network structure and generation/flow constraints changing (for the former, if at a day ahead level, say, certain units were not planned to be operating; for the latter, due to, say weather or maintenance). Machine learning models do not readily generalize outside of the domain they were trained on, so these other ways the problem could change would require retraining, but that is not an insurmountable problem. We will also need to consider whether and how this method can be transferred to the true, nonlinear optimal power flow formulation.
Learning to Optimize: The specific technique we have invented for this work is part of a growing realization that fertile ground for application of machine learning is not to replace optimization but to augment it. In the present case we are using learning to accelerate online optimization of the same sized problem we learn from. In other cases this idea has been used to learn ”heuristics” that allow for efficient solution of much larger problems than they were trained on [31] (this is not out of the question here, but it is beyond the scope of the present paper).
In any case, a benefit of such on approach that is hard to overemphasize is that the learned model is used within a generally convergent optimization scheme. With or without the learning, convergence provides a global, consistent, reliable measure of the quality of the solution. This helps alleviate a frequent and justified complaint that basing decisions on data-driven machine learning models is not appropriate in safety-critical systems.
References
- [1] D. Bertsekas and J. Tsitsiklis, Parallel and distributed computation: numerical methods. Prentice hall Englewood Cliffs, NJ, 1989, vol. 23.
- [2] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends® in Machine learning, vol. 3, no. 1, pp. 1–122, 2011.
- [3] T.-H. Chang, M. Hong, and X. Wang, “Multi-agent distributed optimization via inexact consensus ADMM,” IEEE Transactions on Signal Processing, vol. 63, no. 2, pp. 482–497, 2014.
- [4] R. Zhang and J. Kwok, “Asynchronous distributed admm for consensus optimization,” in International Conference on Machine Learning, 2014, pp. 1701–1709.
- [5] R. Nishihara, L. Lessard, B. Recht, A. Packard, and M. Jordan, “A general analysis of the convergence of ADMM,” in International Conference on Machine Learning, 2015, pp. 343–352.
- [6] G. França, D. Robinson, and R. Vidal, “ADMM and accelerated ADMM as continuous dynamical systems,” arXiv preprint arXiv:1805.06579, 2018.
- [7] K. Cho, B. van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder–decoder for statistical machine translation,” Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), p. 1724–1734, 2014.
- [8] A. Zamzam and K. Baker, “Learning optimal solutions for extremely fast AC optimal power flow,” arXiv preprint arXiv:1910.01213, 2019.
- [9] M. Chatzos, F. Fioretto, T. W. K. Mak, and P. V. Hentenryck, “High-fidelity machine learning approximations of large-scale optimal power flow,” arXiv preprint arXiv:2006.16356, 2020.
- [10] A. S. Zamzam, X. Fu, and N. D. Sidiropoulos, “Data-driven learning-based optimization for distribution system state estimation,” IEEE Transactions on Power Systems, vol. 34, no. 6, pp. 4796–4805, 2019.
- [11] Q. Yang, A. Sadeghi, G. Wang, G. B. Giannakis, and J. Sun, “Robust psse using graph neural networks for data-driven and topology-aware priors,” arXiv preprint arXiv:2003.01667, 2020.
- [12] S. Karagiannopoulos, P. Aristidou, and G. Hug, “Data-driven local control design for active distribution grids using off-line optimal power flow and machine learning techniques,” IEEE Transactions on Smart Grid, 2019.
- [13] R. Dobbe, O. Sondermeijer, D. Fridovich-Keil, D. Arnold, D. Callaway, and C. Tomlin, “Towards distributed energy services: Decentralizing optimal power flow with machine learning,” IEEE Transactions on Smart Grid, 2019.
- [14] A. Xavier, F. Qiu, and S. Ahmed, “Learning to solve large-scale security-constrained unit commitment problems,” arXiv preprint arXiv:1902.01697, 2019.
- [15] Q. Yang, G. Wang, A. Sadeghi, G. B. Giannakis, and J. Sun, “Two-timescale voltage control in distribution grids using deep reinforcement learning,” IEEE Transactions on Smart Grid, vol. 11, no. 3, pp. 2313–2323, 2019.
- [16] A. S. Zamzam, B. Yang, and N. D. Sidiropoulos, “Energy storage management via deep Q-networks,” in 2019 IEEE Power & Energy Society General Meeting (PESGM), 2019, pp. 1–5.
- [17] X. Pan, T. Zhao, and M. Chen, “DeepOPF: Deep neural network for DC optimal power flow,” arXiv preprint arXiv:1905.04479, August 2019.
- [18] T. Zhao, X. Pan, M. Chen, A. Venzke, and S. H. Low, “DeepOPF+: A deep neural network approach for DC optimal power flow for ensuring feasibility,” arXiv preprint arXiv:2009.03147, 2020.
- [19] A. Kargarian, J. Mohammadi, J. Guo, S. Chakrabarti, M. Barati, G. Hug, S. Kar, and R. Baldick, “Toward distributed/decentralized DC optimal power flow implementation in future electric power systems,” IEEE Transactions on Smart Grid, vol. 9, no. 4, pp. 2574–2594, 2016.
- [20] D. Molzahn, F. Dörfler, H. Sandberg, S. Low, S. Chakrabarti, R. Baldick, and J. Lavaei, “A survey of distributed optimization and control algorithms for electric power systems,” IEEE Transactions on Smart Grid, vol. 8, no. 6, pp. 2941–2962, 2017.
- [21] T. Erseghe, “Distributed optimal power flow using ADMM,” IEEE transactions on power systems, vol. 29, no. 5, pp. 2370–2380, 2014.
- [22] Q. Peng and S. Low, “Distributed algorithm for optimal power flow on a radial network,” in 53rd IEEE Conference on decision and control, 2014, pp. 167–172.
- [23] P. Scott and S. Thiébaux, “Distributed multi-period optimal power flow for demand response in microgrids,” in Proc. of the 2015 ACM Sixth International Conf. on Future Energy Systems, 2015, pp. 17–26.
- [24] Y. Wang, L. Wu, and S. Wang, “A fully-decentralized consensus-based ADMM approach for DC-OPF with demand response,” IEEE Transactions on Smart Grid, vol. 8, no. 6, pp. 2637–2647, 2016.
- [25] M. Ma, L. Fan, and Z. Miao, “Consensus admm and proximal admm for economic dispatch and ac opf with socp relaxation,” in 2016 North American Power Symposium (NAPS), 2016, pp. 1–6.
- [26] Y. Zhang, E. Dall’Anese, and M. Hong, “Dynamic ADMM for real-time optimal power flow,” in 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP), 2017, pp. 1085–1089.
- [27] Y. Zhang, M. Hong, E. Dall’Anese, S. Dhople, and Z. Xu, “Distributed controllers seeking AC optimal power flow solutions using ADMM,” IEEE Transactions on Smart Grid, vol. 9, no. 5, pp. 4525–4537, 2017.
- [28] S. Babaeinejadsarookolaee, A. Birchfield, R. Christie, C. Coffrin, C. DeMarco, R. Diao, M. Ferris, S. Fliscounakis, S. Greene, and R. Huang, “The power grid library for benchmarking ac optimal power flow algorithms,” arXiv preprint arXiv:1908.02788, 2019.
- [29] K. Baker, J. Guo, G. Hug, and X. Li, “Distributed MPC for efficient coordination of storage and renewable energy sources across control areas,” IEEE Transactions on Smart Grid, vol. 7, no. 2, pp. 992–1001, Mar. 2016.
- [30] J. P. Hespanha, “An efficient matlab algorithm for graph partitioning,” Santa Barbara, CA, USA: University of California, 2004.
- [31] E. Khalil, H. Dai, Y. Zhang, B. Dilkina, and L. Song, “Learning combinatorial optimization algorithms over graphs,” in Advances in Neural Information Processing Systems, 2017, pp. 6348–6358.