ALADIN-α – An open-source MATLAB toolbox for distributed non-convex optimization

,


Introduction
Distributed non-convex optimization is of significant interest in various engineering domains.These domains range from electrical power systems, [1]- [4] transportation problems, [5] via machine learning, [6] to distributed control, [5], [7]- [9] and distributed estimation.[10]- [13] However, only few software toolboxes for distributed optimization are currently available.Moreover, these toolboxes are typically tailored to specific applications and often focus on convex problems.Examples comprise implementations of the Alternating Direction of Multipliers Method (ADMM) from Boyd et al.; [6] ,1 an implementation of ADMM for consensus problems; 2 and a tailored implementation of ADMM for Optimal Power Flow (OPF) problems in Guo et al. [14] ,3 However, there is a lack of multi-purpose software tools for distributed optimization and, to the best of the authors' knowledge, there are no generic toolboxes for both distributed and decentralized non-convex optimization.
Notice that we distinguish parallel and distributed optimization.In parallel optimization, the main motivations are computational speed-up or computational tractability, while reducing the amount of communication and the amount of central coordination is typically of secondary importance (due to shared memory architectures).In distributed optimization, the main goal is to minimize central coordination and communication (distributed memory architectures).Decentralized optimization additionally requires communication purely on a neighbor-to-neighbor basis.This is especially relevant in multi-agent settings, where individual entities cooperate to the end of optimization, control, or estimation-e.g. in the context of cyber-physical systems, IoT, or embedded control.Essentially decentralized optimization softens the requirement of pure neighborto-neighbor communication by allowing the global summation of scalars.[15] For parallel optimization efficient structure-exploiting tools exist.Classical tools include the GALAHAD software collection and in particular the LANCELOT algorithm, which is based on augmented Lagrangians and efficiently solves problems on shared-memory architectures.[16] A closed-source parallel interior point software is OOPS.[17] The opensource package qpDUNES is tailored towards the time-wise decomposition of Quadratic Programs (QPs) arising in model predictive control.[18] PIPS is a collection of algorithms solving structured linear programs, QPs, and general Nonlinear Programming Problems (NLPs) in parallel.[19], [20] The software HiOp is tailored towards structured and very large-scale NLPs with few nonlinear constraints.It is based on interior point methods.[21], [22] Moreover, combining parallel linear algebra routines (e.g.PARDISO) [23] with standard nonlinear programming solvers (e.g.IPOPT) [24] also leads to partially parallel algorithms.[25], [26] The tools mentioned above are implemented in low-level languages such as C or C++ leading to a high computational performance.On the other hand, their focus is mainly computational speedup via parallel computing rather than distributed and decentralized optimization in a multi-agent setting.
Classical distributed and decentralized optimization algorithms based on Lagrangian 1 https://web.stanford.edu/~boyd/papers/admm/ 2 http://users.isr.ist.utl.pt/~jmota/DADMM/ 3 https://github.com/guojunyao419/OPF-ADMMrelaxation such as dual decomposition or ADMM are guaranteed to converge only for very specific non-convexities typically appearing in the objective function of the optimization problems commonly at a sublinear/linear rate.[27]- [29] In many multi-agent applications, however, the non-convexities occur in the constraints.This implies that classical algorithms are not guaranteed to converge.[2], [8] One of the few algorithms exhibiting fast convergence guarantees in the non-convex case is the Augmented Lagrangian Alternating Direction Inexact Newton (ALADIN) algorithm.[30] Yet-up to now-a publicly available software implementation of ALADIN is missing.The present paper introduces an open-source MATLAB implementation of different ALADIN variants in the toolbox ALADIN-α.It is intended for rapid prototyping and aims at user-friendliness.The only user-provided information are objective and constraint functions-derivatives and numerical solvers are generated automatically using algorithmic differentiation routines and external state-of-the-art NLP solvers.A rich set of examples covering problems from robotics, power systems, sensor networks and chemical engineering underpins the application potential of ALADIN-α.Besides the vanilla ALADIN algorithm, ALADIN-α covers recent extensions including: • improved4 decentralization of bi-level ALADIN with essentially decentralized, respectively, decentralized variants of the Conjugate Gradient method (d-CG) and the Alternating Direction of Multipliers Method (d-ADMM) as inner algorithms; [1], [15] • the nullspace ALADIN variant reducing communication and coordination; [1] • a parametric implementation enabling distributed Model Predictive Control (MPC), and, • heuristics for Hessian regularization and parameter tuning for improving performance.
Moreover, we provide an implementation of ADMM based on the formulation of Houska et al., [30] which uses the same interface as ALADIN.This way, comparisons between ALADIN and ADMM are fostered.Moreover, ALADIN-α can be executed in parallel mode via the MATLAB parallel computing toolbox.This often leads to a substantial speed-up, for example, in distributed estimation problems.A documentation and many application examples of ALADIN-α are available under https://alexe15.github.io/ALADIN.m/.We remark that ALADIN-α intends to be a rapid prototyping environment to enable testing of distributed and decentralized algorithms for non-convex optimization based on ALADIN.At this stage, computational speed or real-time feasibility are beyond the scope of the toolbox.The remainder of the paper is organized as follows: Section 2 recalls the main ideas of ALADIN and bi-level ALADIN.In Section 3 we comment on the code structure and data structures and present a simple tutorial example.Numerical examples from chemical engineering, power systems, and sensor networks illustrate how to use ALADINα in different application domains in Section 4. The appendix provides implementation details.

Preliminaries
We start with a problem formulation amenable for distributed and decentralized optimization.

Problem Formulation
The ALADIN-α toolbox solves structured optimization problems of the form min i∈S where S = {1, . . ., n s } is a set of subproblems,  augmentation terms These terms account for the coupling between the subproblems.Here, Σ i ∈ R n xi ×n xi 0 are scaling matrices and z i ∈ R n xi encodes the influence of other subproblems.Moreover, local non-convex constraints are considered in each subproblem i ∈ S. Since these subproblem-specific NLPs are similar in ALADIN and ADMM, both algorithms share the same computational complexity in the local step.Sensitivities such as the gradients of the local objective ∇f i (x i ), Hessian approximations Bi and Jacobian matrices (∇g i , ∇h i ) of the local constraints are evaluated locally.These sensitivities are combined in a sparse coordination QP adopted from SQP methods.Note that the coordination QP is equality-constrained and strongly convex (under certain regularity assumptions)-thus it can be reformulated as a system of linear equations.The primal and dual solution vectors of this coordination QP are broadcasted to the local subproblems and the next ALADIN iteration starts.The algorithm terminates once the norm of the violation of the constraint (1e) and the stepsize are both sufficiently small.The main advantage of standard ALADIN over other existing approaches are convergence guarantees and fast local convergence.[30] On the other hand, the coordination QP makes ALADIN distributed but not decentralized.Furthermore, the coordination step in standard ALADIN is quite heavy and communication intense compared with other algorithms such as ADMM.Bi-level ALADIN overcomes these drawbacks by constructing a coordination QP of smaller dimension lowering communication.[1] Here, the sensitivities are "condensed" by computing the Schur-complement of the KKT systems  leading to {S i }, {s i }, which are of dimension n c -the number of the coupling variables.
The number of coupling variables is typically much smaller than the total number of variables.These Schur-complements are combined in a lower-dimensional QP, which is solved in a decentralized fashion purely based on neighborhood communication leading to an overall decentralized algorithm.A simplified flow chart of bi-level ALADIN is shown in Figure 2. Observe that-in contrast to standard ALADIN (Figure 1)-bi-level ALADIN solves the coordination QP in a decentralized fashion based decentralized inner algorithms.ALADIN-α comes with two of these inner algorithms: an essentially decentralized version of the Conjugate Gradient (d-CG) method and a decentralized version of ADMM (d-ADMM).[15] The variables, which have to be exchanged in the solution process of the lower-dimensional QP depend on the particular decentralized algorithm at hand.Although these decentralized inner algorithms do not solve the coordination problem exactly, bi-level ALADIN is still guaranteed to converge locally under certain bounds on the numerical precision.[1] A detailed description of ALADIN is given in Appendix A.1.

The ALADIN-α toolbox
This section presents the main contribution of this paper: the ALADIN-α toolbox implementing different ALADIN variants.We comment on its code structure and data structures.Moreover, we illustrate the usage of ALADIN-α on a tutorial example.

Code Structure
In order to simplify algorithm development and testing we choose a procedual/functional programming style.All core features are implemented in MATLAB enabling easy rapidprototyping.The overall structure of run ALADIN()-the main function of ALADIN-αis shown in local NLPs and sensitivities for all subproblems i ∈ S. We use CasADi [31] for algorithmic differentiation and as an interface to many state-of-the-art NLP solvers such as IPOPT.[24] CasADi itself relies on pre-compiled code making function and derivative evaluation fast.
A reuse option avoids the reconstruction of the CasADi problem setup, which enables the use of saved problem formulations.When the reuse mode is activated (e.g. when ALADIN-α is used within an MPC loop), createLocSolAndSens() is skipped, which results in a speed-up especially for large problems.
In the main loop iterateAL(), the function parallelStep() solves the local NLPs and evaluates the Hessian of the Lagrangian (or its approximation e.g. when BFGS is used), the gradient of the objective, and the Jacobian of the active constraints (sensitivities) at the NLP's solution.The set of active constraints is determined by primal active set detection described in Appendix A.1.Furthermore, a regularization procedure is executed if needed.Moreover, in case the nullspace method or bi-level ALADIN is used, the computation of a nullspace basis and the computation of the Schur-complement is performed locally shifting substantial computational burden from the centralized coordination step to parallelStep().The function updateParam() computes dynamically changing ALADIN parameters for numerical stability and speedup.
The coordination QP is constructed in the function createCoordQP().Different QP formulations are possible: here we use a variant considering slack variables from Houska et al. for numerical stability.[30] Different dense and sparse solvers for solving the coordination QP are available in solveQP().Most of them are based on solving the first-order necessary conditions which is a system of linear equations.Available solvers are the MATLAB linear algebra routines linsolve(), pinv() and MA57. 5 Using sparse solvers can speed up the computation time substantially.Note that only MA57 supports sparse matrices.The solver can be specified by setting the solveQP option.In case of convergence problems from remote starting points it can help to reduce the primal-dual stepsize of the QP step by setting the stepSize in the options to a value smaller than 1.More advanced step-size selection rules are subject to ongoing and future work.

Data Structures
The data structure for defining problems in form of ( 1 The second ingredient for ALADIN-α is an opts struct, which specifies the AL-ADIN variant and algorithm parameters.A full list of options with descriptions can be found in the code documentation. 6LADIN-α returns a struct as output.This struct contains the cell xxOpt with local minimizers {x i } i∈S and the optimal Lagrange multipliers λ of the consensus constraints (1e).Moreover the field iter contains information about the ALADIN iterates such as primal/dual iterates and timers collects timing information.Note that run ALADIN() and run ADMM() have the same function signature in terms of sProb-only the options differ.

Further Features
We describe selected features of ALADIN-α-a full description of all features can be found under 7 .
Hessian Approximations Instead of exact Hessians, approximations such as the Broyden-Fletcher-Goldfarb-Shanno-(BFGS) update can be used either to reduce communication and/or to reduce computational complexity in sensitivity computation.The BFGS Hessian is activated by setting the Hess option either to BFGS for standard BFGS or to DBFGS for damped BFGS.For details on BFGS we refer to the book of Nocedal and Wright.[32] Parametric NLP Setup A parametric problem setup, where the objective functions f i and the equality/inequality constraints g i /h i depend on parameters p i is possible.This feature is useful in combination with the reuse option which returns the internally constructed CasADi solvers and derivatives.If one provides a previously constructed NLP as input argument when calling run ALADIN(), the problem construction is skipped, which can lead to a substantial speedup.In an MPC setting, for example, the parameter p i models the changing initial condition in the MPC loop.Moreover, parametric problem data might be useful for large-scale problems where one would like to solve an optimization problem for a wide range of parameters.This feature is activated by adding a parameter cell p to sProb and defining the objective/constraints in terms of two inputs, x i and p i .An example illustrating how to use these features for distributed predictive control of two mobile robots is given in in the code repository. 8arallelization ALADIN-α also supports parallel computing on multiple processors via the MATLAB parallel computing toolbox.Here, we exploit the fact that the local NLPs are independent from each other, i.e., they can be solved in parallel.An example for distributed nonlinear estimation with mobile sensor networks can be found in ??.Parallel computing can be activated by setting the parfor option to true.

Application Examples
We provide numerical examples highlighting applicability of ALADIN-α to a wide range of problems.The code for all these examples is available in the examples\ folder of ALADIN-α.Furthermore, we provide descriptions of these examples in the documentation online. 9Beyond the examples of this section, we consider distributed optimal control and the application of ALADIN-α to test problems from the Hock-Schittkowski test collection in the online repository.[1], [33], [34] A list of all examples is given in Table 1.

A Tutorial Example
Consider the non-convex NLP min   In order to apply ALADIN-α, we reformulate problem (2) in form of (1).We introduce auxiliary variables y 1 , y 2 with y 1 ∈ R and y 2 = (y 21 y 22 ) .We couple these variables again by introducing a consensus constraint i A i y i = 0 with A 1 = 1 and A 2 = (−1 0).Furthermore, we reformulate the objective function f by local objective functions f 1 (y which is in form of (1).Note that the solutions to (2) and (3) coincide but (3) is of higher dimension.This reformulation reveals a general strategy for reformulating problems in form of (1): if there is nonlinear coupling in the objective functions or the constraints, introduce auxiliary variables and require them to coincide by an additional consensus constraint in form of (1e).matrices A i are collected in sProb.We call run ALADIN() with an empty options struct leading to computation with default parameters.The code and the resulting ALADIN-α report after running run ALADIN() are shown in Figure 6.In the ALADIN-α report, the reason for termination and timing information is displayed.Figure 7 shows the output of ALADIN-α while it is running.The figures show (in this order) the consensus violation Ax − b ∞ , the local step sizes x k − z k ∞ , the step size in the coordination step ∆x k ∞ , and the changes in the active set.Note that online plotting may consume a substantial amount of time-hence it is advisable to deactivate online plotting if there is not required e.g. for diagnostic reasons.

Numerical case studies
We present three case studies to shed light on the differences of the implemented algorithms.We consider an optimal control problem for a chemical reactor, an OPF problem, and a sensor localization example.

Distributed Optimal Control of a Chemical Process System
We consider a discrete-time optimal control problem (OCP) for a chemical process system.This OCP can serve as a basis for distributed model predictive control.[35]- [37] The process consists of two Continuous Stirred-Tank Reactors (CSTRs) and a flash separator shown in Figure 8. [38], [39] The goal is to steer the system to the optimal setpoint u s = 0 0 0 and x s = 369.533.31 0.17 0.04 435. 25  After applying a fourth-order Runge-Kutta scheme for discretization, the dynamics of all CSTRs and the flash separator are given by where q i : R n xi ×R n ui ×R n zi → R n xi are the dynamics of the ith vessel with S := {1, 2, 3} being the set of vessels.Here, x i = (x Ai , x Bi , x Ci , T i ) are the states, x Ai , x Bi , x Ci are the concentrations of the reactants A, B and C, and T is the temperature.The input u i = Q i denotes the heat-influx of the ith vessel and z i := (x j ) j∈N (i) are copied states of all neighbors N (i) ⊆ S. Note that the feed-stream flow rates F 10 , F 20 , F 3 , F R and F p are fixed and given.A detailed description of the system dynamics is given in Christofides et al. [39] With the above, we formulate a discrete-time optimal control problem min ] and for all i ∈ S, (4b) ] and for all i ∈ S, (4c) with lower/upper bounds on the inputs u = −u = (5 • 10 4 1.5 • 10 5 2 • 10 5 ) , and lower bounds on the states x k i = 0 for all times k ∈ I [1 T ] and all vessels i ∈ S. The weighting matrices are Q i = diag(20 10 3 10 3 10 3 ) and R i = 10 −10 .The matrices A i are constructed to model the constraint z i := (x j ) j∈N (i) .The sampling time is ∆h = 0.01h and the horizon is T = 10 h.By defining , and 4) is in form of ( 1), where xi corresponds to x i in (1).

Numerical Results
Figure 10 shows the convergence behavior of standard ALADIN, of bi-level ALADIN with decentralized conjugate gradients (d-CG) as inner algorithm, of bi-level ALADIN with decentralized ADMM (d-ADMM) as inner algorithm, and of ADMM over the iteration index k.Specifically, we depict the distance to a minimizer x k − x ∞ , the consensus violation Ax k − b ∞ , and the optimality gap |f (x k ) − f (x )|.Note that for the considered problem, ADMM is not guaranteed to converge because of the nonlinear dynamics.However, since ADMM is nevertheless used in may works, we use it as a baseline for comparison.[40]- [42] Bi-level ALADIN with d-ADMM is executed with inner d-ADMM iterations n inner ∈ {50, 100, 200} and bi-level ALADIN with d-CG is executed with 200 inner d-CG iterations.One can see that ADMM converges fast and there seems to be no benefit when using bi-level ALADIN with ADMM as an inner algorithm.Basic ALADIN and ALADIN with conjugate gradients converges faster, but one has to solve an expensive coordination step in case of basic ALADIN or to perform many inner iterations in case of bi-level ALADIN with d-CG.
Figure 9 shows the resulting open-loop input and state trajectories for OCP (4) for ALADIN and ADMM after 20 iterations, and for ADMM after 100 iterations.At firstglance, all trajectories look quite similar.However, small differences in the input trajectories can be observed.Close inspection of Figure 10 shows that in logarithmic scale the differences can be large.For example the consensus gap Ax − b ∞ is in an order of 10 −1 after 20 iterations, which means that the physical values at the interconnection points have a maximum mismatch of 10 −1 .

Distributed Optimal Power Flow
Next, we consider an OPF problem, which is one of the most important optimization problems in power systems.[43] Distributed optimization is particularly important here due to large problem sizes and due to the necessity of a reduced information exchange between subsystems.We consider the IEEE 118-bus test case shown in Figure 11, which comprises about 500 decision variables.A detailed problem description to match (1) is beyond the scope of this paper.Details on this and on the partitioning scheme are given in Engelmann et al. [44] Numerical Results Figure 12 shows the performance of all distributed and decentralized optimization algorithms coming with ALADIN-α.Bi-level ALADIN with d-ADMM is executed with inner d-ADMM iterations n inner ∈ {50, 100, 200} and bi-level ALADIN with d-CG is executed with 70 inner iterations.One can see that in contrast to the chemical reactor from the previous subsection ADMM converges quite slowly and requires about 1,500 iterations to converge to an acceptable level of accuracy.This  underlines that the performance of ADMM is problem dependent-especially in a setting with non-convex constraints.Basic ALADIN and bi-level ALADIN with d-CG on the other hand converge rapidly and to a high accuracy.For bi-level ALADIN with d-ADMM, the achievable accuracy depends on the number of inner d-ADMM iterations.Table 2 shows timing information for all algorithms converging to Ax k ∞ < 10 −3 .We use a computer with an Intel Core i7-8550U processor with 4 cores, 16 GiB of memory, and MATLAB R2020a running Arch Linux with parallel computing disabled.The initialization phase of the sensitivities is not considered.One can see that there is not much difference between the ALADIN variants since most of the time is spent in solving the local NLPs and this step is the same.ADMM is about five slower since it requires many more iterations and thus many more NLP solutions are computed.

Summary & Future Work
This paper has introduced one of the first open source toolboxes for distributed nonconvex optimization: ALADIN-α.It is based on the Augmented Lagrangian Alternating Direction Inexact Newton (ALADIN) algorithm and implements various extensions mostly aiming at reducing communication and coordination overhead.Moreover, ALADIN-α comes with a rich set of examples from different engineering fields reaching from power systems over non-linear control to mobile sensor networks.
Although ALADIN-α performs well for many small to medium-sized problems, we aim at further improving numerical stability in future work by developing more advanced internal auto-tuning routines.Furthermore, developing distributed globalization strategies for enlarging the set of possible initializations seems important and promising.

Consensus
Step: Solve the coordination QP min ∆x,s i∈S and return ∆x k and λ QPk .
5. Line Search: Update primal and dual variables by with α k = 1 for a full-step variant.Update Σ k i and ∆ k .

A.1. ALADIN in Detail
Standard ALADIN is summarized in Algorithm 1.Each ALADIN iteration executes three main steps: step 1. solves local NLPs (5) for fixed and given values for primal iterates z k i and dual iterates λ k in parallel.The parameter sequences {Σ k i } 0 and {∆ k } 0 are user-defined-details are described inSubsection A.4.10 Note that the equality constraints (1b) and box constraints (1d) are not explicitly detailed in Houska et al. [30]-we consider them separately here for numerical efficiency reasons. 11Step 2. of Algorithm 1 computes sensitivities such as the gradients of the objective functions and positive definite approximations of the local Hessian matrices where is the set of active inequality constraints in subproblems i ∈ S and τ > 0 is a userdefined parameter which can be specified via the actMargin option.Moreover, we define combined inequality constraints and Jacobians of active constraints for all i ∈ S. With this information, step 4. of Algorithm 1 solves an equality constrained quadratic program (6) serving as a coordination problem.
Step 5. of Algorithm 1 updates z k and λ k based on the solution to (6).To achieve global convergence guarantees, the step size parameter α ∈ (0 1] has to be properly chosen by a globalization routine.Designing suitable distributed globalization routines is subject of ongoing and future work-we use the full step variant α = 1.A smaller stepsize can be specified via the stepSize option which might stabilize ALADIN-α for certain problems.Note that time-varying parameter sequences {∆ k } and {Σ k i } with Σ k i , ∆ k 0 might accelerate convergence of ALADIN in practice.Heuristic routines for doing so are described in Subsection A.4. 12

A.2. Solving the Coordination QP
The Hessian approximations B k i are assumed to be positive definite.Hence, problem (6) is a strictly convex equality-constrainted QP which can be solved via the first order optimality conditions (if C k i has full row rank) which is a system of linear equations.There are two possibilities for solving (6) numerically: either by centralized linear algebra routines or by iterative methods.For centralized computation, several solvers are interfaced in ALADIN-α which can be specified via the solveQP option.The available solvers are summarized in Table 3.Note that not all solvers support sparse matrices.MA57 usually perfoms very well in practice-both in terms of speed and robustness.The second approach to solve (6) is via iterative and decentralized routines such as d-CG and d-ADMM.Details of these decentralized routines are described in Subsection A.6.

A.3. Hessian Approximations
As B k i may have zero eigenvalues or may even be indefinite if evaluated via (7), special care has to be taken. 13Here we use a heuristic using ideas from Nocedal and Wright [32]other heuristics are possible and might accelerate convergence.Our heuristic "flips" the sign of the negative eigenvalues (if there are any) and puts the zero eigenvalues to a small positive number δ.The intuition here is that the stepsize in the direction of negative curvature becomes smaller the "more negative" the curvature is.For doing so we compute the eigendecomposition B k i = V i Λ i V i for each subproblem i locally, where Λ i is a matrix with the eigenvalues of H i on its main diagonal and V i is the matrix eigenvectors.Hence, the regularization reads with δ = 10 −4 .Regularization can be activated by the option reg and δ can be specified via regParam.
As an alternative to exact Hessians with regularization, one can use the Broyden-Fletcher-Goldfarb-Shanno (BFGS) update for successively approximating the exact Hessian based on the gradient of the Lagrangian.This has the advantage that only the gradient of the Lagrangian has to be communicated (which is a vector) instead of the Hessian (which is a matrix).A detailed description on how to use BFGS within ALADIN can be found in Engelmann et al. [44] The BFGS formula can be activated by the setting the option Hess to BFGS or to DBFGS for damped BFGS.The advantage of damped-BFGS is that it guarantees positive-definiteness of B k i regardless of the positive-definiteness of the exact Hessian at the current iterate.Note that in case the nullspace method is used (cf.Subsection A.5), the regularization is done for the reduced Hessian Bk i instead of B k i .

A.4. Scaling Matrices
A simple heuristic for the sequences {Σ k i } 0 and {∆ k } 0 is to start with certain (usually diagonal) initial matrices Σ 0 i , ∆ 0 and to multiply them by a factor r ∆ , r Σ > 1 13 If one would use (7) These routines have been successfully used in previous works.[1], [44] An alternative for choosing {∆ k } is based on the consensus violation for each individual row in (1e).The idea here is to increase the corresponding ∆ k ii to drive the corresponding consensus violation to zero.This technique is common in algorithms based on augmented Lagrangians, cf.Bertsekas [46,Chap 4.2.2].Mathematically this means that we choose with γ ∈ (0 1) and β > 1.In ALADIN-α we choose β = 10 and γ = 0.25.This rule can be activated by the option DelUp and is able to accelerate convergence of ALADIN-α substantially in some cases.Note that the above heuristics such as regularization or parameter updates do not interfere with the fast local convergence properties of ALADIN-α.They are required for guaranteeing fast local convergence since they ensure that the assumptions made in the local convergence proof of ALADIN such as the positive-defniteness of Bk i are satisfied.[30] A. 5

. The Nullspace Method
The nullspace method can be activated to reduce the dimensionality of the coordination QP (6), thus reducing communication and computation in the coordination step.The idea is to parameterize the nullspace of the active constraints null(C k i ) : where Bk . Note that Āk i has an iteration index k and changes during the iterations since Z k i changes.Similar to the full-space approach, regularization from Subsection A.3 is used (if it is activated via the option reg) yielding a positive definite Bk i .The nullspace method can be used by activating the option redSpace.Notice that the required communication between the subproblems and the coordinator is reduced by twice the number of equality constraints and active inequality constraints.Thus, the communication reduction can be large for problems with many constraints.Furthermore, the coordination QP ( 6) is in general less expensive to solve since ( 12) is of smaller dimension than (6).Indeed, ( 12) is strongly convex under suitable assumptions which ( 6) is not necessarily.[1] While computing nullspaces is numerically expensive (due to singular-value decomposition), it is done parallel in our context-thus fostering parallelization.

A.6. Bi-level ALADIN
Bi-level ALADIN is an extension of ALADIN to further reduce dimensionality of the coordination QP (12).Moreover, it enables the use of decentralized ADMM or essentially decentralized conjugate gradients as inner algorithms leading to an overall (essentially) decentralized ALADIN variant.
We briefly recall the main idea of bi-level ALADIN.Under the assumptions from Engelmann et al., [1] evaluating the KKT conditions for (12) yields Bk ∆v + ḡk + Āk λ QP = 0, (13a) where Bk , Āk and ∆v k are block-diagonal concatenations of Bk i , Āk i and ∆v k i .Using the Schur-complement reveals that ( 13) is equivalent to solving the system of linear equations i∈S where S k i := Āk The key observation for decentralization is that the matrices S k i and vectors s k i inherit the sparsity pattern of the consensus matrices A i , i.e., zero rows in A i yield zero rows/columns in S k i and s k i .Intuitively speaking, each row/column of S k i corresponds to one consensus constraint (row of (1e)) and only the subproblems which "participate" in this constraint have non-zero rows in their corresponding S i .This sparsity can be exploited to solve (14) in a decentralized fashion.Examples for such algorithms are decentralized ADMM or an essentially decentralized conjugate gradients algorithm presented in Engelmann and Faulwasser.[15]

Figure 1 :
Figure 1: Simplified flow chart of standard ALADIN.

Figure 4 :
Figure 4: The sProb data structure for defining problems in form of (1).

Figure 11 :
Figure 11: Map of the IEEE 118-bus test system.

i− 1 i
Bk Āk i 0 are local Schur-complement matrices and s k i := Āk i v k i − Bk −1 i ḡk i are local Schur-complement vectors.
) is a struct called sProb.This data structure collects the objective functions {f i } i∈S and constraint functions {g i } i∈S and {h i } i∈S in cells, which are contained in a nested struct called locFuns.Furthermore, sProb collects lower/upper bounds (1d) in cells llbx and uubx.The coupling matrices {A i } i∈S are summarized in AA.One can provide NLP solvers and sensitivities optionally-in this case the problem construction in createLocSolAndSens() is skipped leading to a speedup for large problems.This way, problem setups can be saved and reused.For a minimal working example of ALADIN-α, one only needs to specify ffi and AA.Optionally one can provide initial guesses zz0 and initial Lagrange multipliers lam0.

Table 3 :
(6)ardless, the coordination step(6)would not necessarily produce descent directions destroying the local convergence properties of ALADIN.In case of zero eigenvalues, B k i is singular and the coordination step can not be solved by a standard solver for linear systems of equations.Centralized QP solvers interfaced in ALADIN-α.
whereZ k i ∈ R x xi ×(n xi −|A k i |)is matrix whose columns are a basis of null(C k i ).Note that C k i Z k i = 0 by definition of the nullspace.Using this parametrization, (6) can be written as min i ∆v i + ḡk i ∆v i + (λ k ) s + s 2 ∆ k subject to i∈S Āk i