Index-aware learning of circuits

Summary Electrical circuits are present in a variety of technologies, making their design an important part of computer aided engineering. The growing number of tunable parameters that affect the final design leads to a need for new approaches of quantifying their impact. Machine learning may play a key role in this regard, however current approaches often make suboptimal use of existing knowledge about the system at hand. In terms of circuits, their description via modified nodal analysis is well-understood. This particular formulation leads to systems of differential-algebraic equations (DAEs) which bring with them a number of peculiarities, e.g. hidden constraints that the solution needs to fulfill. We aim to use the recently introduced dissection concept for DAEs that can decouple a given system into ordinary differential equations, only depending on differential variables, and purely algebraic equations that describe the relations between differential and algebraic variables. The idea then is to only learn the differential variables and reconstruct the algebraic ones using the relations from the decoupling. This approach guarantees that the algebraic constraints are fulfilled up to the accuracy of the nonlinear system solver, which represents the main benefit highlighted in this article.


INTRODUCTION
Design optimization and uncertainty quantification are key tools of modern computer aided engineering, that both rely on objective functions to express quantities of interest in terms of the variables of the underlying system.Due to the increasing complexity of engineering systems, machine learning approaches have gained popularity for constructing surrogate models of objective functions when they become expensive to evaluate and a large number of (design or uncertainty) parameters are present.In such situations, classical model order reduction techniques 1 or function approximation approaches 2 suffer from the curse of dimensionality: the number of operations to construct and the memory required to store the surrogate model grow exponentially with respect to the number of parameters.Experimental and in some cases even theoretical evidence 3 shows that machine learning approaches may be able to overcome this curse of dimensionality and provide surrogate models that are fast to evaluate, while requiring comparably little data for their construction and storage.
In the context of electrical circuit design, neural networks have been used for design optimization for over 20 years 4,5 .More recently, Gaussian process regression has been employed for both uncertainty quantification and design optimization during analog integrated circuit design 6,7 .The commonality between these approaches is that they focus on the learning part: they all aim to provide a computationally efficient and accurate surrogate model, given data produced by some circuit simulator.Thus, they all treat the circuit simulator as a black box that simply provides the data which is then used for constructing the surrogate.In contrast, we want to exploit the known structure that underlies the equations describing electrical circuits.
More specifically, we consider the modified nodal analysis 8 (MNA).MNA is one of the most popular circuit descriptions and lies at the center of SPICE-like simulation software such as LTspice 9 , Xyce 10 and PSpice 11 .Applying MNA to a given circuit leads to systems of differential-algebraic equations (DAEs), which can generally be written as systems of implicit differential equations 12 ( ′ , , , ) = , (0) =  0 , (1)   where the Jacobian   ′ (), of  w.r.t. ′ , is singular and  are the design or uncertainty parameters.Intuitively, one can think of DAEs as ordinary differential equations (ODEs) that are constrained to a manifold determined by (hidden) constraints on the solution variables .We aim to exploit the special structure of the DAEs arising from MNA by using the dissection index 13 to propose an approach for learning electrical circuits more accurately and efficiently.More concretely, we use the dissection index to decouple the DAEs into sets of ODEs and purely algebraic equations, such that the entire dynamics of the solution may then be found only using the ODEs, while the algebraic equations may be used to recover the entire solution.
In the following, section 2 introduces MNA and DAEs in more detail and states some well-known results.Afterwards, section 3 outlines the dissection index and showcases its properties using example circuits.The new approach is then presented on an abstract level in section 4, and on a numerical level in section 5. Preliminary conclusions about the effectiveness of the approach and future research directions are given in section 6.

MNA AND DAES
We first look at the system of DAEs that results from MNA when not considering controlled sources.As they are of crucial importance for engineering applications however, we will note whether extensions including controlled sources are available or missing at the appropriate times.Borrowing the notation from Tischendorf 14 , the system of MNA reads where the left hand side as a whole corresponds to  in (1), and the solution variables are given by  = [,  L ,  V ] ⊤ .The functions  R ,  L and  C model resistive, capacitive or inductive devices respectively, that may each depend on the parameters .The terms for independent current and voltage sources are given by  s () and  s (), while  denotes the vector of nodal potentials and  L ,  V are the currents flowing through branches containing inductors or voltage sources respectively.The last ingredient is given by the incidence matrices  * , where * indicates the device type.These collect the branch to node relations of the underlying electrical network, when considering the branches and nodes as edges and vertices of a directed graph.
In order to obtain a version of (2) that is better suited to analysis and implementation, we consider the device function Jacobians1

𝐆
( where we use the same notation for the Jacobians as in section 1. Inserting (3) into the original system and writing everything in matrix form yields assuming that  L and  C do not explicitly depend on time.
We note that (4) readily extends to multiport devices 14 .A general inductive -port for example, can also be modeled by ] ⊤ , however the Jacobian is not necessarily diagonal in this case as the component functions of  L can each depend on all the currents  L = [ 1 , ⋯ ,  −1 ] ⊤ .Regarding the incidence matrices, one chooses a reference node  0 for the multiport device and then considers  − 1 branches from the remaining  − 1 nodes to the reference.Interpreting the multiport as a single node and using Kirchhoff's current law then gives  0 = − ∑ −1 =1   for the reference current, when orienting all currents to point toward the device.Following this approach, there is then no difference in treating multiports compared to one-ports 14 .

Index concepts
Before outlining the dissection index, we want to give a brief introduction to index concepts more generally.There are multiple definitions of the index of a DAE, along with related index concepts that each possess different strengths and weaknesses 15 .One important aspect that unites these ideas is that they agree in key cases, e.g. when looking at linear DAEs, and the same also holds true for the dissection index.To emphasize the practical importance of the notion of index, we take a closer look at the perturbation index.It is based on a perturbed version of the DAE (1) (we leave out the parameters  for conciseness) where  is a sufficiently smooth perturbation, such that the required derivatives exist.

Definition 1.
Let  be a solution of the unperturbed DAE, then the DAE is said to have perturbation index  ∈ ℕ, if  is the smallest natural number such that, for any sufficiently smooth solution x of ( 5), there exists  ∈ ℝ with for an appropriate norm ‖ ⋅ ‖ and the right hand side small enough.
The idea behind this definition is to capture the impact of perturbations on the solution, as the name suggests.Usually these pertubations are assumed small, in the sense that ‖‖ ∞ ≪ 1, but fast changing, such that ‖ () ‖ ∞ may grow very quickly in .

Example
To illustrate the perturbation index, and also to hint at its relevance for circuit simulation, we consider the small example given in Figure 1.Applying MNA to the circuit and introducing a perturbation yields the following DAE where x = [ φ1 , îV ] ⊤ .Using (6b) we find Inserting ( 7) into (6a) and rearranging then gives Small example circuit for illustrating the perturbation index.
Noting that the solution  to the unperturbed problem follows directly from the perturbed solution by setting  = , we obtain so the unperturbed system corresponding to (6) has perturbation index  = 2 as (8) depends on the first derivative of  2 .

Index of MNA
The structure and index of MNA are well understood when only considering independent sources 14 (as in ( 4)), but also when including controlled sources 16 .For the case without controlled sources, there exists the following well-known topological index result.
2. The index of ( 4) is  ≤ 1, if and only if there are no loops consisting only of capacitors and voltage sources containing at least one voltage source and no cutsets consisting only of inductors and current sources.
The result can be found in terms of the perturbation index 17 , the tractability index 14 , the differentiation index 16 and the dissection index 13 .We again note that there also exist extensive results about when the index of MNA including controlled sources does not exceed  = 2 16 .

DISSECTION INDEX
We focus on the dissection index, as it enables the decoupling of a DAE into an ODE and a set of purely algebraic equations.This is conceptually different from the perturbation index, however other index concepts such as the tractability and differentiation indices also use decoupling strategies.Still, the dissection index maintains some advantages over these concepts, as it provides a simple algorithmic procedure for the decoupling that is similar to the tractability index, but poses less strict smoothness assumptions.In the case of MNA without controlled sources (4), it is even possible to find a purely topological decoupling based on the dissection index 13 .We will not make use of this topological decoupling in the derivation however, but rather consider the dissection index for a more general class of DAEs, to formulate the assumptions that are necessary for our method to work also for DAEs other than (4).
We consider a DAE in standard form 13 , where the matrix-valued functions  and  are derived from (1) by defining Note that the description of MNA in ( 4) is precisely of this form.In the following, we will demonstrate the first two steps of the dissection index when applied to systems of the form of (9), while stating the assumptions of our approach.Appendix A contains additional remarks showing that these assumptions are fulfilled by (4).

Index one case
Assuming () to be sufficiently smooth with values in ℝ × , we define four basis functions (), (), (), () such that im () = ker (), im () = ker  ⊤ () and the columns of () and () together form a basis of ℝ  , while the columns of () and () together form a basis of ℝ  .We now make the following assumption.
Assumption 1.The basis functions  and  of () are constant.
Assumption 1 may seem restrictive, but it is fulfilled by many systems occurring in practical applications 13 .This in particular includes MNA, as Remark 1 shows.The key idea of the dissection index is to use the basis functions to split the solution variables  into two parts where ⋅ is used to indicate differential (dynamic) variables and ⋅ signifies algebraic (fixed) variables.When inserting the splitting (10) into (9) we obtain () d d x + ()x + () x + () =  (11)   and this motivates the notation, as only x appears differentiated in time.The procedure then continues by multiplying (11) once with  ⊤ () and  ⊤ () each from the left, to also split the system.This yields where we also introduce shorthands for the arising products.Using the fact that M() is regular by construction 13 , we can now define the index one case of the dissection index.
This is motivated by the observation that (12b) is a purely algebraic equation with a locally unique solution for x in terms of x, given that K () is regular.In this case (12a) then describes an ODE in the differential variables x.

Index two case
As the example from Figure 1 illustrates, there are many DAEs, including those described by MNA, which can have an index higher than one.In these cases the dissection index proceeds by introducing additional basis functions and continuing the splitting process in a similar fashion.We begin by focusing on the algebraic equation (12b), and consider basis functions P(), Q(), V(), W() of K (), defined analogous to the ones for ().This allows us to further split the algebraic variables x as follows Inserting this splitting into (12b), and multiplying once by V⊤ () and W⊤ () each from the left, splits the algebraic equation into two parts We can now also split the differential variables x further, by using basis functions P() and Q() of W⊤ () K () and inserting this splitting into the second algebraic equation (14b) yields We now make the following two assumptions.
Assumption 3. The basis functions P and Q of W⊤ () K () are constant and ( 16) possesses a locally unique solution for x in terms of x and .
We note that while Assumption 2 is equivalent to the DAE not being underdetermined 13 , Assumption 3 is rather important for the implementation, but not for the dissection index itself.In fact our approach still works if the basis functions P(x  ) and Q(x  ) depend on x , however we focus on the stronger assumption here, since it shortens the expressions in the following without impacting the general idea and Remark 2 shows that MNA fulfills an even stronger condition than Assumption 3.
Having a locally unique solution for x in terms of x and  at hand, we now turn to the first algebraic equation (14a).Similar to (16), inserting the splitting (15) of the differential variables into (14a) gives a system with a locally unique solution for x in terms of x , x , x and , as V⊤ () K () P() is regular by construction 13 Finally, we move towards the decoupled index two system by expanding x and x in (12a), according to ( 15) and ( 13) respectively, Using basis functions Ṽ() and W() of M() Q, we can now split (18) further by multiplying from the left by Ṽ⊤ () and W⊤ () once each.Reordering then yields We observe that ( 19) is of a form similar to (12) and that Ṽ⊤ () M() Q is again regular by construction 13 .Together with ( 16) and ( 17) providing locally unique solutions for x and x respectively, this motivates the following definition.
This again follows from the purely algebraic equation (19b) having a locally unique solution for x in terms of x , x , x and , given that W⊤ () K () Q() is regular.The differential part (19a) then describes an ODE in the index two differential variables x , analogous to the previous case, and we call x the differential variables and [x  , x , x ] ⊤ the algebraic variables.We note that the procedure may be continued for even higher index DAEs, by repeating the steps of the index two case with x and x playing the roles of x and x respectively.

First example circuit
We now demonstrate the dissection index by applying it to the example circuit given in Figure 2. The circuit contains a voltage source  s (), a linear resistor with resistance , a linear capacitor with capacitance , a linear inductor with inductance , as well as a diode D that is modeled by a nonlinear resistance  D ( 3 ).Comparing the conditions of Theorem 1 with the example circuit shows that the circuit has index  = 1, thus we only have to perform the first step of the dissection index.
We begin by writing out the system obtained from applying MNA to the example Figure 2 First example circuit: simple diode oscillator.
where  = 1∕ is the inverse of the resistance.For the first basis functions  and  we find ⎦ and as () in the context of ( 9) is symmetric in this case, we have  =  and  =  for the remaining two basis functions.Using these, we split the unknowns into x = [ 3 ,  L ] ⊤ and x = [ 1 ,  2 ,  V ] ⊤ according to (10), which allows us to obtain the systems corresponding to (12a) and (12b) Since ( 20b) is linear, we can explicitly solve for x in this case.Subsequently inserting this solution into (20a) then gives an ODE and a purely algebraic system as promised Second example circuit We derive a second example circuit from the first by substituting a current source  s () for the voltage source, compare Figure 3.
Looking at the index criteria from Theorem 1, we observe that this circuit has index  = 2, as there now is a cutset consisting of the inductor and current source.Therefore, we need to execute two steps of the dissection index in order to split the equations into purely differential and algebraic parts respectively.The corresponding MNA system is given by where we also note that the dimension is smaller compared to the previous example.The first two basis functions  and  are where it holds again that  =  and  =  due to symmetry.We now list the remaining basis functions, omitting the intermediate steps for brevity, ] and finally obtain an ODE in one differential variable only, together with a purely algebraic system that recovers the remaining algebraic variables We note that all basis functions of the second step are also constant for this example, as was the case for the first step.As hinted at in section 3, it is possible to find a purely topological decoupling based on the dissection index 13 , which agrees with the intuition given by the index result from Theorem 1.When considering large circuits, this topological decoupling along with its topological basis functions is to be preferred over other basis function choices, as it avoids the numerical computation of the basis functions, which becomes prohibitively expensive for large systems.We also note that a similar topological decoupling result exists for circuits including controlled sources, however only for the first step of the decoupling, as it is framed in the context of semi-explicit methods for which one only requires a DAE in semi-explicit form 13 .An extension of this result to circuits of higher index is within the scope of further research.

INDEX-AWARE LEARNING
In the following, we outline the use of the dissection index in the context of machine learning.The workflow is illustrated in Figure 4 and directly follows the structure of the dissection index.We give a general description in a first step, followed by examples using the two circuits from Figure 2 and Figure 3.

Our approach begins by performing the decoupling of a given DAE into an ODE and a purely algebraic equation (AE)
following the main steps of the dissection index.
2. Afterwards, only the differential variables of the ODE are learned.For a DAE of index one, these would be the entries of x, and for a DAE of index two, the entries of x using the notation of section 3.
3. The remaining algebraic variables, x for index one or [x  , x , x ] ⊤ for index two, may then be reconstructed using the algebraic equations.
We remark that the identification of the differential variables, and thus the advantages of the second point, are in principle available for any Spice based simulator via the purely topological decoupling 13 , whereas the reconstruction of the algebraic variables requires an additional implementation.We summarize the important steps of this additional implementation for the index one and two cases below.

Index one case
In order to recover the algebraic variables at time  in the index one case, we only need to solve (12b) for x() using the learned x().

Index two case
In the index two case, we start by solving (16) for x () using the learned x ().Since we also require the derivative d d x () in (19b), we consider a small time increment Δ and approximate the derivative using a backward difference Δ .
We note that this only reflects our implementation; in principle any finite difference (or similar) approximation is possible.Finally, we determine x () and x () by jointly solving ( 17) and (19b) using the learned x (), x () and the approximation of

Example circuits
In terms of the example circuits from section 3, the workflow amounts to the following: for the first example we consider  3 and  L as the differential variables that need to be learned, and for the second example only  3 is left.All the remaining algebraic variables may be recovered using ( 21) or ( 22) respectively.Thus the learning effort is already reduced quite significantly in these two examples; from five to two variables in the first and from four down to one variable in the second.Another key benefit comes from the fact that the reconstructed algebraic variables exactly fulfill the inherent constraints of the DAE.This means that even though the learned solution variables (think of  3 for example) are only approximations, the reconstructions (e.g. 1 or  2 ) will still be consistent.This may be of great importance for systems where the physical interpretability of the solution depends on it satisfying the constraints.
There is yet another, maybe less expected, benefit that might occur.Looking at the decoupled system from ( 22) we find that the resistance parameter  and the inductance parameter  only appear in the algebraic equation.In terms of our original goal of speeding up design optimization or uncertainty quantification, where a lot of solutions for varying parameter values are required, this means the following: instead of having to solve the full system for a given combination of  and , we can instead simply solve the algebraic equation to obtain the full solution.While the algebraic equation might be more complicated than the simple linear relations of ( 21) and (22) in general, solving it is almost certainly much faster than having to integrate the entire system in time.This becomes an even bigger advantage when knowledge about the solution is only required for specific points in time, since the algebraic equation may be solved pointwise.As of now, we have no easy way to automatically determine which parameters appear in the ODE.But when combined with a sequential learning strategy, such as the one outlined in section 5, there may still be computational savings due to the learning method requiring less samples for the parameters not appearing in the ODE.
Lastly, we emphasize that the approach is independent of the particular machine learning method that is used for learning the differential variables.Thus methods developed especially for ODEs may be employed and exchanged depending on the problem at hand.

NUMERICAL EXAMPLES
Before we present numerical results, we want to provide some background on our machine learning method of choice, Gaussian processes (GPs), and the particular learning strategy we employ.

Gaussian processes
The following brief introduction is based on the textbook of Rasmussen and Williams 18 , and we refer to the book itself for more details.We consider the problem of learning one component () of the DAE solution (), compare (9), based on observations We will focus on the one-dimensional case for clarity of exposition, however we remark that the ideas extend to the case where the solution component (, ) also depends on the parameters  and thus more than one variable, compare also the textbook 18 .
A GP suited for this problem is defined by a (prior) mean function  ∶ ℝ → ℝ and covariance function  ∶ ℝ × ℝ → ℝ.The learning problem is then tackled using Bayesian inference, such that one aims to obtain the posterior distribution x(), given the observations  and a point , where  is to be predicted.A particular feature of GPs is that this posterior process, under suitable assumptions, turns out to be another GP with posterior mean and covariance 19 where  ∶= [( 1 ), … , (  )] ⊤ denotes the prior mean function evaluated at the observations, () ∶= [(,  1 ), … , (,   )] ⊤ similarly denotes the pairwise evaluation of the covariance function using the prediction point  and the observations, and  ∶= [ 1 , … ,   ] ⊤ are the observed function values.The weights () are given by the solution of where [] , ∶= (  ,   ) and  2 models i.i.d.additive Gaussian noise on the observations.For a discussion on modeling choices where the assumptions leading to (23) are not fulfilled, see the review article by Swiler et al. 20 .
The key model component influencing the learning process is the covariance (or kernel) function , since it determines the approximation properties of the GP.It encodes prior knowledge about the function () that is to be learned, such as its differentiability or characteristic length scales.We opt for a radial basis function kernel, given by where   and the length scale  are hyperparameters, which allow for better approximation capabilities of the GP.The kernel is selected to match the differentiability of the solution.
In practice, the mean function  is often taken to be zero as the data is assumed standardized, and the hyperparameters are then determined by minimizing the negative log likelihood 19 − log ( (|,   , )

Learning strategy
The learning strategy aims to exploit one of the key features of GPs: they provide both a mean prediction m and an associated variance estimate k, as detailed in (23).We use these properties in conjunction with a sequential sample selection strategy, that starts out with a small number of training data and adds further samples based on the variance estimate of the GP.This idea is not new, see again the textbook of Rasmussen and Williams 18 for more references and details, however we still want to outline our particular approach to make the results better interpretable and reproducible.Our implementation proceeds as follows: 1. We select a grid of time points  = {  ∶ 1 ≤  ≤   } and parameter values  = {  ∶ 1 ≤  ≤   }, where each   represents a specific combination of parameter values, for which we want to learn the solution of the DAE using a GP.
2. We select a subset  ⊂  ×  and use the corresponding solutions for the initial training of a GP.
3. As a termination criterion, we compute the posterior mean m(, ) using (23a) in all grid points (, ) ∈  ×  and check whether the relative prediction error 4. We compute the variance prediction k(, ) using (23b) for all (, ) ∈ ( ×  ) ⧵  and add a point of maximum variance to the training data set .
5. Finally, we retrain the GP and continue with 3. We note that several improvements may be made to this strategy, such as properly maximizing the variance estimate in 4, instead of sampling on a discrete grid.The implementation is based on the STK toolbox 21 .

First example circuit
We again consider the example of Figure 2, where we choose  s () = sin(600) V,  = 500 Ω and the diode is modeled by  D ( 3 ) = 10 −14 (  ( 3 ∕26 mV) − 1 ) S. The remaining parameter values are chosen according to the sequential sample selection strategy with 1 mH ≤  ≤ 3 mH and 100 nF ≤  ≤ 300 nF, i.e.  = [, ] ⊤ in the context of section 1.The starting points for the strategy are given by all combinations of the boundary values for  and  together with  = 0 ms and  = 10 ms once for each combination.We note that the choice to consider  and , and not for example , as parameters here is arbitrary and only serves to illustrate the approach.One could still follow the same approach when considering e.g. the initial conditions, or the diode model, as being parameterized.Recalling (21), we see that  3 and  L have to be learned as the differential variables.In addition to these two, we also consider the algebraic variable  2 and we then compare the accuracy of learning  2 directly to recovering it from the two differential variables.Figure 5 shows the solution for  = 1.7 mH and  = 220 nF, a combination which is not part of the training data.
In Figure 6, we see the convergence of the relative prediction errors with respect to the number of samples used by the sample selection strategy.We observe that, depending on the qualitative complexity of the dynamics, the solution variables take  different numbers of samples to reach the same prediction accuracy.It should be noted however, that the total number of samples   also includes the number of time points that are sampled, and not only the number of distinct parameter combinations (simulations).The latter are listed in the caption of Figure 6 and turn out to be considerably smaller.One may also note that we only execute the learning strategy up to a relatively large tolerance of 10 −3 .This is due to the fact that both the optimization of the hyperparameters, as well as the computation of the posterior mean and variance, scale badly for conventional GPs, leading to large computation times.Remedies for this exist, see e.g. the book by Rasmussen and Williams 18 , however this issue lies outside the scope of this article as our approach may be combined with machine learning method of choice.The differences between the mean predictions and simulations for the differential variables, when using the predictions belonging to the smallest relative errors from Figure 6, are shown in Figure 7.The results again correspond to  = 1.7 mH and  = 220 nF, and we observe that the differences are in line with the relative prediction errors of Figure 6.The differences between the mean prediction φ2 , reconstruction φ2 and the simulation of the algebraic variable  2 are highlighted on the left of Figure 8.The results again correspond to the same parameter values  = 1.7 mH and  = 220 nF.We observe that there is not much difference between the mean prediction and reconstruction, however the reconstruction does appear to have a slight advantage in terms of accuracy.Here, one should note that although the reconstruction is exact up to the accuracy of solving the algebraic equation in (21), it still contains the error from learning the differential variables, hence the overall difference between φ2 and  2 .To better quantify the difference between the learned and reconstructed solutions, we introduce consistency errors ê() and ē() based on ( 21) , where ⋅ indicates learned variables and ⋅ refers to reconstructed variables.For a consistent solution obeying all algebraic constraints, both errors are identically zero.The right of Figure 8 shows these consistency errors when using the same predictions and reconstructions as on the left.We observe a very clear improvement of the reconstructions (consistency error on the order of machine precision) over the directly learned predictions (consistency error as large as 10 −3 ).While the central is this adherence to the constraints, only having to learn two of the five solution variables also represents a significant reduction of the learning effort in this case.

Second example circuit
For the second example from Figure 3, we select  s () = 10 −4 sin(400) V, while all other parameters and the learning strategy remain the same.Recalling (22), we find that  3 is the only differential variable that needs to be learned to reconstruct the remaining algebraic variables.We again focus on  2 as an algebraic variable to obtain a comparative example.Considering similar numerical studies as for the first example, the left panel of Figure 9 shows the solutions of  3 and  2 for  = 1.7 mH and  = 220 nF, which are again not part of the training data.The right plot shows the convergence of the relative prediction errors with respect to the total number of samples   .We again note that the number of distinct parameter combinations is significantly smaller, as listed in the caption.We also see that the relative errors behave similar for both variables in this case, which is to be expected given the similar outlook of their dynamics in the left plot.
At this point we also return to the discussion from section 4 about parameters only appearing in the algebraic equation.During the learning process, the sample selection strategy requested 21 unique values for , all of which required full simulations according to (22).The 7 unique values for , that were requested for learning  2 , did not require full simulations however, but rather could be reconstructed from (22).This results from the fact that the ODE in (22) does not depend on , such that the solution of  3 also does not depend on that parameter.The same goes for the algebraic variable  L , thus  2 and  1 can be reconstructed only based on the knowledge of  3 .We again emphasize that time is also included as a parameter within the sample selection strategy, such that the reconstructions of the algebraic variables only need to be evaluated at the particular points in time that are requested by the strategy, rather than at all time points of the solution.
The difference between the mean prediction and simulation of  3 , for  = 1.7 mH and  = 220 nF, is shown on the left of Figure 10, while the right shows the differences between the mean prediction, reconstruction and simulation of  2 .The  predictions again correspond to the smallest relative errors from Figure 9, and the differences are of the same order.In this case, the reconstruction performs similar or even slightly worse compared to the mean prediction when only looking at the difference.However it still constitutes a reduction in learning effort, from four solution variables down to one, and most importantly the reconstruction adheres to the algebraic constraints of the DAE.To further illustrate this point, we again take a look at the consistency errors, now redefined based on ( 22) In our example, the algebraic equation from (22) gives an explicit description of the algebraic variables once more, such that the reconstruction is accurate up to machine precision, compare ē() in Figure 11.When learning all solution variables individually however, we observe a much larger maximum value of around 10 −3 for the consistency error ê() across all the predicted points in time.In general, the algebraic equation may be nonlinear such that the reconstruction still leads to a consistency error ē() greater than machine precision, depending on the accuracy of the nonlinear system solver.

Rectifier circuit
As a third and larger example we consider the rectifier circuit from Figure 12.Aside from requiring the full index two implementation from section 4, this example also showcases another potential application in the context of electrothermal simulations.Electrothermal simulations are used to investigate the thermal behavior of a circuit, by coupling the power that is dissipated in the circuit to a set of equations describing the temperature distribution in the circuit, and by using the temperatures of some components as parameters for certain device functions.In our case, we again model the diodes as nonlinear resistors, however this time the model also includes a temperature dependence 22 where  D is the voltage across the diode in forward direction,  D is the temperature of the diode,  is the charge of an electron,  the Boltzmann constant and  s ( D ) the temperature dependent reverse saturation current 22 .The current source is given by  s () = 0.1 cos(100) A and the capacitor and resistor are linear with  = 1 mF,  = 50 Ω.The circuit also contains a transformer T 12 modeled as a nonlinear inductive multiport with where ( L ) is a fourth order polynomial in  L 23 .We also note that rectifiers play an important role in many power electronics applications; they are key when converting high AC to lower DC voltages.In the example we still use a current source, since this leads to an index two circuit in this case, compare Theorem 1, which requires the full index two approach from section 4.
When decoupling the DAE arising from the circuit of Figure 12 using the dissection index, one finds for the differential and algebraic variables using the notation from section 3. We first observe that there are only two differential variables x , again leading to a significant reduction in the learning effort.We also see that the first differential variable  34 ∶=  3 −  4 is a linear combination of the original variables of the DAE.This is a general phenomenon and does not interfere with our approach, as the original variables may always be reconstructed from the sets of differential and algebraic variables by reversing the splitting using (10), ( 13) and ( 15) from section 3 To keep the overall learning effort manageable, we consider the outer diodes D 1 and D 4 to depend on the same temperature  1 and the inner diodes D 2 and D 3 to depend on the same temperature  2 , such that the vector of parameters is now given by  , where the additional ⋅ and ⋅ again refer to learned or reconstructed variables respectively.For  1 = 65 • C and  2 = 85 • C the consistency errors can be seen in Figure 16.We observe that the reconstructed solution again outperforms the directly learned solution by several orders of magnitude.

CONCLUSIONS AND FUTURE RESEARCH
This article introduced a new approach for learning the time and parameter dependent solutions of electrical circuits.The approach assumes the circuit to be modeled using MNA, and then exploits the structure of MNA to improve the learning process by splitting it into a learning and a reconstruction step, compare Figure 4.It achieves this by decoupling the underlying DAE into an ODE and a purely algebraic equation.Benefits of the approach include a reduction in the number of variables that are to be learned during the learning step, and the exact adherence of the learned solution to the inherent constraints of the circuit model after the reconstruction step.Numerical examples illustrated both benefits.The examples also showed that the exact recovery of the constraints might improve the accuracy of the learned solutions.Furthermore, additional computational savings may be possible as some of the parameters of interest only appear in the reconstruction step, which avoids the need for training data with varying values of these parameters entirely.We emphasize that the approach is independent of the machine learning method used during the learning step, such that the learning method may be chosen according to the problem at hand.Multiple extensions are possible within this index-aware learning framework.A natural first step could be to make use of the topological decoupling that was hinted at in section 3.This would pave the way for the inclusion of controlled sources within the workflow, and thus the industrial use of the approach, with the idea being the extension of the topological decoupling to also allow for controlled sources.Focusing in on the idea of adhering to physically meaningful constraints, one could work on extending the approach to a charge conserving variant of MNA, to potentially guarantee charge conservation even for the learned solutions.Yet another direction may be the application of index-aware learning to DAEs arising elsewhere, e.g. in modified loop analysis.(The dissection index applies to a more general class of DAEs, as section 3 showed.)Finally, separate but related work may focus on improving the learning step by developing new methods that are especially suited for learning ODEs.

1 2 3 Figure 3
Figure 3 Second example circuit: simple diode oscillator with current instead of voltage source.

Figure 4
Figure 4 Schematic workflow of index-aware learning.

Figure 5
Figure 5 Solution of the first example circuit for  = 1.7 mH and  = 220 nF.

Figure 6
Figure6 Convergence of the relative prediction error , for the variables shown in Figure5.The final accuracies correspond to approximately 180 different combinations of  and  for  3 , 39 combinations for  2 and 37 for  L .

Figure 7
Figure 7Differences between the mean predictions ( φ3 and îL ) and the corresponding simulations for the differential variables when considering  = 1.7 mH and  = 220 nF.The predictions correspond to the smallest relative errors from Figure6.

Figure 8
Figure 8Differences between the mean prediction φ2 , reconstruction φ2 and the corresponding simulation of the algebraic variable  2 (left) and the respective consistency errors (right).The predictions correspond to  = 1.7 mH,  = 220 nF and the smallest relative error from Figure6.

Figure 9
Figure 9 Solution of the second example circuit for  = 1.7 mH and  = 220 nF (left) and convergence of the relative prediction error for the same variables (right).The final accuracies correspond to 31 distinct combinations of  and  for  3 and 49 for  2 .

2 Figure 10
Figure 10Differences between the mean prediction and simulation for  3 (left) and between the mean prediction, reconstruction and simulation for  2 (right).The predictions are again made for  = 1.7 mH and  = 220 nF and correspond to the smallest relative errors from Figure9.

Figure 11
Figure 11 Consistency errors ê() and ē() corresponding to the results from Figure 10.

Figure 12
Figure 12 Full wave rectifier circuit.The resistances of the diodes depend on the parameters  = [ 1 ,  2 ] ⊤ .

1 Figure 13 2 Figure 14
Figure 13 Solutions of the differential variables of the wave rectifier circuit for  1 = 65 • C and  2 = 85 • C.

Figure 15 Figure 16
Figure 15 Differences between the mean predictions (⋅) and simulations of all differential and algebraic variables for  1 = 65 • C and  2 = 85 • C.

Remark 1 .,)
Recalling (4), we observe of MNA.We now determine the basis functions  and .Taking into account the standard assumption, compare Theorem 1, that ( L ) and  ( where im  C = ker  ⊤ C ,  and  are zero and identity matrices of appropriate dimensions and the columns of  C and  C together form a basis of ℝ   with   the dimension of .The description of  C results from the fact that (with  ∶=  ⊤ C )  ⊤  C  (  > 0, ∀ ∈ ℝ   ⧵ {}, and hence the kernel of  C  (  ⊤ C  )  ⊤ C is determined by the kernel of  ⊤ C .As  C is an incidence (constant) matrix, we find that  and  are also constant.