Quasi‐Newton waveform iteration for partitioned surface‐coupled multiphysics applications

We present novel coupling schemes for partitioned multiphysics simulation that combine four important aspects for strongly coupled problems: implicit coupling per time step, fast and robust acceleration of the corresponding iterative coupling, support for multirate time stepping, and higher‐order convergence in time. To achieve this, we combine waveform relaxation—a known method to achieve higher‐order in applications with split time stepping based on continuous representations of coupling variables in time— with interface quasi‐Newton coupling, which has been developed throughout the last decade and is generally accepted as a very robust iterative coupling method even for gluing together black‐box simulation codes. We show convergence results (in terms of convergence of the iterative solver and in terms of approximation order in time) for two academic testcases—a heat transfer scenario and a fluid‐structure interaction simulation. We show that we achieve the expected approximation order and that our iterative method is competitive in terms of iteration counts with those designed for simpler first‐order‐in‐time coupling.


INTRODUCTION
We consider time-dependent surface-coupled multiphysics applications. Examples are numerous: flutter of airplanes or wind turbines, blood flow in arteries, gas quenching or cooling of rocket engines, to only name a few. We follow a partitioned coupling approach, which abides the natural wish to reuse established single-physics solvers. Hereby, our goal is to have highest flexibility, while still having a fast time-to-solution in a black-box coupling framework. Only minimal information from the solvers is used in that only boundary conditions at the interface are exchanged, as well as geometric interface information, that can be used to map nonmatching interface meshes.
In this article, we focus on the time integration. The combination of different physics oftentimes implies different time scales. Thus, we would like to be able to use different time steps in the involved solvers. In addition, higher-order time integration is required to efficiently produce sufficiently accurate results. An open problem is to design stable, efficient and high-order methods that allow for different time steps in the separate solvers 1 and for the use of black-box solvers. A class of methods that potentially have these properties is waveform relaxations, also called waveform or dynamic iterations. 2,3 These integrate over a time window, allowing the subsolvers to use potentially arbitrary methods and time steps in that window, in the spirit of the partitioned approach. To this end, continuous interpolations in time are employed to provide solution values to the other problem. When the time integration over the window is repeated iteratively, high-order can be obtained by sufficiently high order in the subsolvers, as well as the interpolation. An efficient method is obtained, when this iteration converges fast.
The standard option is to add a relaxation step, which is why the terms waveform relaxation and waveform iteration (WI) are used synonymously in most literature. Waveform relaxation was originally introduced for efficient analysis of large digital circuits where components have to be resolved on different timescales. [4][5][6][7] Today it is widely applied in field/circuit coupling. [8][9][10] Recently, Neumann-Neumann waveform relaxation was suggested 11 and applied to conjugate heat transfer problems. 12,13 Optimal relaxation parameters are determined for the fully discrete version of this specific problem, leading to superlinear convergence. This was later extended to the time adaptive case. 14 The optimal relaxation parameter is, however, problem dependent. This means that when applied as a black-box method, many iterations are needed.
Different alternatives have been suggested to speed up waveform relaxation, for example, convolution waveform methods, 15 Krylov methods 16 and multigrid. 17 Another alternative are waveform relaxation Newton methods. 6 There, the Newton method is formulated on the initial value problems using Fréchet derivatives. All of these offer interesting theoretical properties, which are, however, hard to exploit in practice.
Another idea that is more straightforward to exploit in practice is to use a quasi-Newton method instead of relaxation to improve the convergence of WIs. This has been very successfully applied for nonwaveform partitioned black-box FSI. The basic variant is called interface quasi-Newton inverse least-squares (IQN-ILS). 18,19 As a nonlinear solver, the method is also known as Anderson acceleration. 20,21 Important recent improvements have been parallel execution of solvers, 22 filtering, 23 reducing tuning parameters [24][25][26] and the generalization to multiple coupled solvers. 27 The resulting methods lead to very few coupling iterations, are robust without tuning parameters, and have linear compute and memory complexity in terms of interface degrees of freedom due to matrix-free formulations and due to a mesh independent convergence rate. Implicit coupling together with state-of-the-art IQN methods overcome added mass instability and convergence issues and can therefore also be applied for biomechanical applications. 28,29 In this article, we combine WI with IQN, with the goal of getting a novel fast partitioned black-box coupling method that still has high order and allows for different time step sizes in different solvers. We use the idea of time windows similar to References 2,3, that is, we perform coupling iterations not within single time steps but accross all time steps in a given time window. We consider versions of the quasi-Newton method that differ in terms of which subsets of time step data in the so-called coupling windows are used to approximate the system Jacobian. A similar approach has been followed in Reference 30 for a less general setting. There, subcycling, that is, using several time steps of one solver within one time step of the other one, is considered. For a 1D problem and certain time step ratios, the stability of the time integration is analyzed, which is confirmed by numerical results in two dimensions (2D). Our approach allows different numbers of time steps of both solvers within a so-called coupling window such that the time steps of the involved solvers no longer have to be multiples of each other (called multirate time stepping).
For the description and the numerical experiments, we only consider so-called Dirichlet-Neumann type coupling methods. These are a basic building block in FSI that only use boundary conditions that are available in the subsolvers anyhow. To this end, Dirichlet data or Neumann data, respectively, at the interface are exchanged between the solvers. A generalization for other coupling conditions is possible, but requires additional numerical testcases to evaluate the resulting performance. We do not adapt time steps dynamically in time, but we would like to point out that time adaptivity is straightforward to use in WIs. This is important, since it can give dramatic performance improvements. 31,32 We demonstrate up to third order convergence in time for the combined method. The iteration numbers do not vary much when varying the time step ratios and are rather small. We test the proposed algorithms by using the coupling library preCICE 33 with the finite element frameworks FEniCS 34 and Nutils. 35 We, therefore, use IQN from preCICE and implement WI in a new middle layer between preCICE and the coupled solvers.
The structure of the article is as follows: In Section 2, we explain the mathematical framework for both concepts-WI for multirate coupled time stepping on the one hand and quasi-Newton on the other hand-separately and then show how we can combine them. We then describe in Section 3 how we realized a prototype implementation in a middle software layer between coupled solvers and preCICE. In Section 4, we show numerical results for a coupled heat equation and an FSI scenario, both in 2D. We conclude in Section 5.

COUPLING ALGORITHMS
Before introducing the implementation in the coupling library preCICE, we present the details of our coupling approach allowing for multirate time stepping, higher-order accuracy in time, and fast iterative convergence in several steps: Section 2.1 introduces our notation and the general form of the coupling formulation as used in standard low-order approaches. Section 2.2 explains how to extend this basic coupling with a waveform approach, which includes information from individual time steps to increase the approximation order. Section 2.3 explains how to construct a quasi-Newton scheme for a general setting. Section 2.4 combines both concepts to define quasi-Newton variants for multirate higher-order coupling.

Basic notation and multirate coupling via single value coupling
We consider a coupled problem on two nonoverlapping domains Ω  and Ω  , discretized in space. We use a partitioned approach, that is, we rely on stand-alone solvers for each domain, which we regard as black boxes providing access only to input and output data, but not to discretization details. The coupling of solvers happens at the coupling interface Γ = Ω  ∩ Ω  . We couple the solvers via Dirichlet-Neumann coupling. The Dirichlet solver on Ω  takes a Dirichlet boundary condition at the coupling interface Γ from the Neumann solver on Ω  , while the Neumann solver takes a Neumann boundary condition at the coupling interface Γ from the Dirichlet solver. For FSI, the natural choice is to use the fluid solver as Dirichlet solver with Dirchlet boundary values being velocities or displacements, and the solid solver as Neumann solver with Neumann boundary values being forces or stresses (see Section 4.2). For other coupled problems such as heat transfer, which we use as a second application testcase in Section 4.1, there is no natural choice, but the roles can be chosen arbitrarily. Still, there is typically a numerically better choice, which, for instance, can depend on the material parameters of the two media 36 . Here, Dirichlet boundary values represent temperature values at the coupling interface and Neumann boundary values the heat flux computed from the internal temperature field. Both solvers perform time stepping (explicit or implicit) using their respective schemes in a given time window [t ini ; t ini + Δt] with initial time t ini and window size Δt. Within each window, we assume them to use a fixed, but possibly different time step size. The Dirichlet and the Neumann solver, thus, perform n D or n N time steps of size t  and t  , respectively (Δt = n  t  = n  t  ). We denote the action of a single time step by where u  and u  are the internal variables and c  and c  the coupling variables of both solvers. Please note that all variables are discretized in space, meaning they are vectors with (arbitrary) degrees of freedom as entries. Index i denotes the time level t ini + i ⋅ t  or t ini + i ⋅ t  within one window. We now define simplified operators, as the action of our black-box solvers over the complete window if they use only the values at the end of the coupling window-c  end or c  end , respectively-as input and output values: Given Dirichlet coupling values c  end , the Dirichlet solver computes n  time steps with c ,i+1 ≡ c  end for all i = 0,… ,n  − 1, followed by a postprocessing step returning Neumann coupling values at the end of the coupling window, c  end =  (u ,n  ). If necessary, the postprocessing also includes a data mapping operation between the discretizations of both solvers. Afterward, the Neumann solver computes n N time steps with c N,i+1 ≡ c  end for all i = 0,… ,n N − 1, followed by c  end =  (u  ,n  ). We introduce the simplified notation to express the coupling condition at the interface by means of the fixed-point equation acting only on the Dirichlet interface data c  end at the end of the window. We refer to coupling methods solving Equation (2) in this contribution as single value coupling and denote them as SC(n  ,n  ). Furthermore, the execution of n  and n  time steps in the solvers before coupling at the end of time window is called multirate time stepping. The special case with either n  = 1, n  ≠ 1 or n  ≠ 1, n  = 1 is commonly called subcycling. 37,38 Equation (2) entails a Gauss-Seidel-type coupling, executing  SC and  SC one after the other, while the roles of  SC and  SC could also be swapped leading then to a fixed-point equation in terms of c  end . Furthermore, we could instead also consider a Jacobi-type coupling. 22 For the sake of clarity, we omit both additional variants in this contribution. To achieve a stable coupling also for strongly coupled problems and to increase accuracy, we consider an implicit (or strong) coupling, that is, we repeat  SC and  SC multiple times per time window until the residual falls below a given threshold. This corresponds to a fixed-point iteration for Equation (2) and is a classical Godunov splitting for a single iteration that always yields only first-order in time. We can summarize multirate single value coupling in an algorithmic setting as 1. Initialize c  0 end with value from previous time window, k = 0. 2. Solve the Dirichlet problem (with n  time steps): , meaning: a. Initialize u k ,0 with value from previous time window. b. For i = 0,… ,n  − 1: 3. Solve the Neumann problem (with n  time steps): , meaning: a. Initialize u k+1 is small enough: Go to the next time window.
5. Else: k = k + 1 and go back to item 2.
k denotes the iteration counter. Note that this procedure only converges if H SC is a contraction. Otherwise it requires an additional acceleration step such as underrelaxation or quasi-Newton in item 5 in the algorithm above. We describe details for this acceleration in Section 2.3.

Extension to waveform iteration
WIs are an elegant way to increase the order in time with minimal use of the solvers' internal details, that is, well suited for black-box coupling. By contrast to the method from the previous section, WIs exchange more data than just the solutions at the end points. The basic idea is to use a time-continuous representation of c  instead of only the value c end at the end of the coupling window to connect the two solvers. We can construct representations of the continuous functions c  and c  from all time steps in the window [t ini ; t ini + Δt] by interpolation based on a finite dimensional basis. We, thus, define c  and c  as functions in time: where m  and m  are the number of degrees of freedom at the coupling interface.
If both solvers execute independent time steps within the time window as introduced above, they can now evaluate the continuous representations c  and c  as boundary conditions wherever necessary. If the Dirichlet solver, for example, executes three time steps using the trapezoidal rule (TR) and t  = 1 3 Δt, it samples c  at four points in time (t ini , t ini + 1 3 Δt, t ini + 2 3 Δt, t ini + Δt). If the Neumann solver uses two implicit Euler (IE) time steps of size t  = 1 2 Δt, it evaluates c  at two points in time (t ini + 1 2 Δt and t ini + Δt). If one of the solvers uses higher-order time integration (such as a Runge-Kutta RK4 scheme), necessary substeps can now also be retrieved from c  or c  . Giving continuous boundary value representations to our solvers yields new operators These new operators are composed of two parts: First, the actual solver actions, that execute Equation (1) n  or n  times, respectively, sample the coupling conditions from the time-continuous representations, and return all time-discrete values after postprocessing c  ,i =  (u ,i ) and c ,i =  (u  ,i ), respectively. Second, the interpolation operators, † that generate an interpolant with an error of order  ( Δt p+1 ) from n time-discrete points and an 'initial value' c 0 at t ini (i.e., a total of n ′ = n + 1 datapoints). The new operators read They yield a new fixed-point equation operating on functions in time or, expressed in terms of the time-discrete steps, We denote fixed-point iterations for Equations (3) or (4) as WI(n  ,n  ;p) and the corresponding operator as H p n  ,n  . Figure 1 compares SC(2,5) and WI(2,5;2) in a schematic way. Note that in the limit of Δt to zero, we would recover a semidiscrete discretization with Equation (3) representing a WI with waveforms in an infinite dimensional function space. This situation is regularly considered in the WI literature. In an algorithmic setting, we can summarize WI as 1. From the previous window, get c ,0 and c  ,0 . Set the initial approximation as constant extrapolation c  0 ≡ c ,0 ‡ , k = 0.
2. Solve the Dirichlet problem (with n  time steps): . If, for instance, the Dirichlet solver uses an IE time stepping this means: a. Initialize u k ,0 with value from previous time window. b. For i = 0,… ,n  − 1: ) .

Interpolate to obtain a continuous waveform: c k+1
4. Solve the Neumann problem (with n  time steps): . If, for instance, the Neumann solver uses an IE time stepping this means: a. Initialize u k+1 ) .

Interpolate to obtain a continuous waveform: c k+1
‖ ‖ ‖ small enough: Go to the next window.
7. Else: k=k + 1 and go back to item 2.
Note that, in the general case, different interpolation operators  p  ,c  0 and  p  ,c  0 yielding different orders of error may be used for the two problems. For the sake of simplicity, we restrict ourselves to the case where p  =p  =p.
At this point, we would like to add an example for which combinations of interpolation methods and time stepping schemes have to be used to (i) reproduce the solution of a problem with known polynomial solution of degree p † We occasionally drop the subscript for the quantities n  , n  , c  , c  if we do not refer to a specific side of the coupled problem, but to either. ‡ Other choices for initialization are possible, but are beyond the scope of this article.

F I G U R E 1
Comparison between single value coupling (SC(2,5), left) and waveform iteration (WI(2,5;2), right). For SC(2,5), the Dirichlet solver , depicted in blue, sends the value c  end to the Neumann solver  , depicted in orange.  uses c end as boundary condition for all its respective time steps of size t  , exemplary depicted by the blue arrow for the first time step computing c  ,1 . For WI(2,5;2) the Dirichlet solver  provides the interpolant c  .  samples c  for obtaining boundary conditions for all its respective time steps of size t  , exemplary depicted by the blue arrow for the first time step computing c  ,1 using the sampled value c  (t ini + t). Note that SC(2,5) is not equivalent to WI(2,5;0), that is, waveform iteration with constant interpolation. This is due to the fact that single value coupling does not use piecewise constant interpolation, but globally constant interpolation over the whole time window in time exactly and to (ii) achieve a desired approximation order p in time in general: For (i), we have to use polynomial interpolation of at least degree p to represent this solution exactly at the interface. For time integration, a scheme of at least order p has to be used. For (ii), we have to use a pth-order polynomial interpolation, since we can at most afford an error of (Δt p+1 ) per time window, which accumulates to an error of (Δt p ) over (Δt −1 ) time steps. For time integration, again a scheme of at least order p has to be used.

Interface quasi-Newton methods
After having introduced two specific fixed-point problems with operators H SC and H p n  ,n  in the previous two sections, we now consider a general fixed-point problem and the associated residual: We explain how we can formulate an accelerated iteration with  denoting the acceleration operator. We stop the iteration, when the norm of the coupling residual ||R(x k )|| is small enough. If  is the identity, we get a classical fixed-point iteration (Picard iteration), which only converges if H is a contraction. For FSI, this is typically not true. 39 In early FSI days, the acceleration operator  has been realized as underrelaxation, for example, based on a dynamic Aitken scheme. 40 A numerically more efficient approach is to reuse past iterates to build up an approximate Jacobian of R and use this in quasi-Newton iterations. 18,19 To avoid linear dependencies between information from previous iterations, we have to perform a modified Newton iteration starting from the result of the pure fixed-point iteration (for details, see References 29): as an approximation of the inverse of the Jacobian ofR. To compute JR(x k ), we collect input-output information from past iterates § of H in tall and skinny matrices For transient coupled problems, we need to solve a fixed-point equation per time window. In this case, it is beneficial for efficiency to also include iterates from past windows in V k and W k . 29 This variant is however beyond the scope of this article and subject to future work. The matrices V k and W k define the multisecant equations for the inverse Jacobian To get the classical IQN-ILS, 18 we close Equation (5) by which gives In practice, the Jacobian is not stored, but calculated in matrix-free form from via a QR-decomposition of V k ,

Quasi-Newton waveform iteration
Now we combine the two ideas we discussed above: (i) We have used a WI approach dealing with multirate settings, that is, different numbers n and n of time steps of size t  and t  within a window of size Δt in the involved solvers, such that Δt = n  t  = n  t  , and, at the same time, maintaining higher-order. (ii) We have presented the concept of quasi-Newton methods as an acceleration approach for the resulting interface fixed-point equations. Now, we show how we can combine the two parts toward a robust, efficient, and accurate coupling for partitioned simulations with black-box solvers.
The missing piece here is the answer to the question which residual components we should consider to build our quasi-Newton scheme. For a multirate setting with given n  and n  based on the fixed-point equation Equation (2), the choice is canonical, that is, we use For WI, we have formulated our Dirichlet-Neumann coupling in terms of the fixed-point equation Equation (4). We have all n  time steps of c  included in the fixed-point equation Equation (4). The canonical quasi-Newton variant for this system concatenates all time steps' data in a large vector for both input and output of the operatorR. Thus, we get matrices V k and W k in R (m  ⋅n  )×k . We call this variant QN-WI. This method is computationally more expensive as we have to deal with a substantially larger interface system than for single value coupling. Typically, the size of the interface system has, however, a negligible influence on the overall performance. 29 A variant to reduce the size of multisecant system Equation (5) is to measure the residuals not at all samples, but only a subset. To test such a setting in the following, we only consider one specific case: we only use the last sample c n  , that is, In this case, the approximate matrix J −1 R is no longer a square matrix but in R (m  ⋅n  )×m  . However, we do not explicitly calculate J −1 R , but solve a least squares problem as in Equation (6) with ∈ R k as before. We call this variant reduced quasi-Newton rQN-WI.
Remark 1. If we do not include all time steps in the input vector x and, thus, in the matrix W k , but simply restrict the fixed-point iteration formulation entirely to the last step n  , we would also only update the entry c n  and, thus, not generate an appropriate update of the whole waveform representation of c  .
Remark 2. The method of De Moerloose et al 30 corresponds to WI(1,n;1) or WI(n,1;1), that is, one solver executes n time steps t within one window, while the time step of the other solver is equal to the window size Δt. Linear interpolation is used to transfer information from the larger time step (Δt) solver to the other solver ( t = Δt∕n). In this case, QN-WI and rQN-WI are equivalent. In our general setting, we impose no restrictions on time steps of the involved solvers except that they meet at a common time at the end of the window, that is, the window size Δt is a multiple of the solver's time steps t. Therefore, our generalization also allows to naturally handle more than two coupled solvers.
Remark 3. In literature, WI, that is, H p n  ,n  , is usually combined with (constant) underrelaxation. We compare our quasi-Newton variants to this standard technique in Section 4 and call the latter rel-WI.
Remark 4. In the case Δt to zero, the method would be a quasi-Newton method for Equation (3) with waveforms in an infinite dimensional function space, formulated using Fréchet derivatives.
Summarizing, we can state that QN-WI and rQN-WI can be considered as quasi-Newton variants in space-time, whereas QN-SC only acts in space.

SOFTWARE
We now describe how we realize the aforementioned multirate coupling algorithms in software. In fact, the bigger picture of our research is to implement an elaborate and sophisticated treatment of the temporal dimension into the widespread open-source coupling software preCICE. 33 This article is an important milestones for this endeavor. The purpose of this section is manifold: to motivate and justify software design choices, to make our research reproducible, also in different setups, and to allow existing preCICE users to quickly pick up, use, and further validate the novel multirate coupling algorithms. It is, however, also safe to directly jump to the numerical results in Section 4. preCICE is a library that allows to couple existing (legacy) simulation codes to complete multiphysics simulations in a minimally invasive way. preCICE is written in C++, but also offers bindings for C, Fortran and Python. For many community codes (such as OpenFOAM, CalculiX, or SU2), ready-to-use adapters are provided. 41 The setup of coupled codes and of the data to be exchanged is configured at run-time through a global XML file. Various quasi-Newton algorithms are already implemented in preCICE including IQN-ILS 18 as introduced in Section 2.3.
Whereas single value coupling, as introduced in Section 2.1, is based on exchanging the values of the coupling variables at window end points, WI, as introduced in Section 2.2 needs time-continuous representation of the coupling variables over complete windows. This important mathematical difference is also reflected in the software requirements. preCICE currently only offers interface methods such as data = read_data(...) and write_data(data,...) to read and write values at a single point in time, which can be interpreted in most cases as the window end t ini + Δt. Afterward, the interface method advance() triggers the exchange of these values between both solvers and the computation of the quasi-Newton acceleration. This set of methods provides everything needed to realize single value coupling, which, in fact, is already fully supported 42 by the preCICE version we use in this contribution, v1.6.1. WI, however, requires some notion of time in the application programming interface (API) of preCICE, which would make a fundamental redesign of the coupling logic necessary.

Waveform bindings
As the choice of the most suitable WI algorithm for preCICE is still an open research question, we instead aim for a prototype software design, which allows to easily test various multirate approaches without requiring immature refactoring of preCICE. A mature implementation of WI in preCICE is subject to future work. We, therefore, decided that the prototype should implement WI in a new middle layer called waveform bindings, sitting between the adapter code and the preCICE library. Whereas preCICE does the communication and the quasi-Newton acceleration of the coupling data, the waveform bindings handle the time-continuous representation. With this solution, we do not require any alterations to preCICE itself. Figure 2 sketches the layered architecture.
The essential trick to realize this separation of concerns is to treat a complete time window like a single time step in preCICE. To this end, the data fields of preCICE need to be altered: instead of one data field, such as Displacements, we now have one data field per time step of a window, Displacements1, Displacements2, and so on. preCICE can be configured to handle and communicate multiple fields, but has no notion of time for these fields, they are just several data fields for a single preCICE time step. The waveform bindings, on the other hand, have this notion of time: Displacements1 are the values after the first time step within a window, Displacements2 after the second time step, and so on. The API of the waveform bindings now offers methods such as write_data(data,..., time) to write timestamped data at a specific time step t ∈ {t ini + t, t ini + 2 t, … , t ini + Δt} and data = read_data(..., time) to read data at an arbitrary t ∈ [t ini , t ini + Δt]. If the latter does not coincide with a given time step, an interpolated value is returned. For data interpolation, we use a B-Spline curve. 43 The corresponding implementation is provided in the SciPy package ¶ via the functions interpolate.splrep and interpolate.splev, which are used to compute interpolating B-Spline curves of degree one to four.
As we only use n + 1 data points in time and, in particular, neither use values from previous windows nor derivatives in time for B-Spline interpolation, we have the restriction p ≤ n for  p,c 0 . We use a pth-order B-Spline, meaning piecewise linear (p = 1), quadratic (p = 2), or cubic (p = 3) interpolation. A first-order B-Spline requires two data points, that is, n ≥ 1, and is, thus, always possible, a second-order B-Spline requires at least three samples, that is, n ≥ 2, and a third-order B-Spline at least four samples, that is, n ≥ 3. Therefore, accuracy is limited by the number of time steps n that defines the maximal polynomial degree p of the interpolation scheme. In future work, we intend to remove this restriction by using data from previous windows.
The waveform bindings also offer a function advance() which in turn calls the preCICE advance() upon completion of a window to communicate the discrete values (such as Displacements1, Displacements2, … ) between both solvers. In the preCICE configuration, the quasi-Newton acceleration can now be configured to be computed on the full set of data fields or only a reduced subset as explained in Section 2.4. ¶ SciPy version 1.3.1

Application programming interface
To only require minimal modification to existing adapters and testcases, the main API design goal of the waveform bindings # is to mimic the preCICE API as closely as possible. In fact, solely the data access functions are altered as explained above to include time as an additional input argument. Therefore, rewriting an existing coupled code to use the waveform bindings instead of preCICE directly requires modifying less than ten lines of code, which makes alongside testing of single value coupling and WI easy. Listing 1 exemplarily shows how the waveform bindings are used to couple a structure solver. The code is nearly identical to a coupled code directly using the preCICE API. In the following, we explain the slight differences. For a detailed explanation of the preCICE API itself, we refer to the respective documentation || . The obvious change in line 1 is to import waveform_bindings instead of precice. Afterward, the waveform bindings need additional configuration in lines 11 to 13. The number of time steps per window on both sides as well as the interpolation strategy need to be defined. For the data access in lines 23 and 31, the preCICE API is extended as mentioned previously. In the example, the structure solver only samples force values at the end of each time step t + t in line 23, as we use an implicit Newmark-beta method and, thus, only require these values. For other time stepping methods, forces could be read from the waveform bindings wherever necessary.

Limitations
The layered architecture, however, also comes with technical shortcomings. First, as the waveform bindings build upon the Python bindings of preCICE, currently only codes in Python can be coupled. Besides this language restriction, the software concept comes with a few additional limitations as expected for a prototype. For quasi-Newton acceleration, preCICE allows to reuse iterations from previous time steps to achieve faster convergence. As preCICE has no notion of time for the samples Displacements1, Displacements2, … , the reuse strategy would give wrong values. We, therefore, simply switch off this functionality for all experiments in the following section. For the same reason, we cannot use underrelaxation in preCICE, neither directly nor as starting value for the quasi-Newton methods. Otherwise the first relaxed iteration would be a weighted combination between the current solution and the waveform of the last window. Waveform relaxation rel-WI as introduced in Section 2.4, however, needs a weighted combination between the current solution and a constant waveform extrapolated from the end value of the previous window. As a remedy, we realized underrelaxation directly in the advance() function of waveform bindings and switched it off in preCICE. Last, each coupled solver needs to know the number of time steps per window for both sides at configuration, see Listing 1, lines 11 to 13, contradicting the flexibility argument of black-box coupling. We aim to eliminate all these limitations with a mature implementation of WI within preCICE in future work. Within pre-CICE in particular means to maintaining backward compatibility. To which extent this is possible is an open research question.

NUMERICAL RESULTS
We consider two testcases to evaluate the performance of the proposed algorithms both in terms of convergence properties of the quasi-Newton iterations and in terms of the achieved approximation order in time: (i) a simple 2D heat equation (diffusion equation) with known analytic solution, where the computational domain has been divided artificially into two partitions, (ii) a fluid-structure interaction scenario, where a solid truss is deformed by the surrounding fluid flow. Both examples are discretized in space using finite elements. To comply with the restriction of the waveform bindings to Python, we use the finite element frameworks FEniCS **34,44 and Nutils † † . 35

Conjugate heat transfer
With this academic testcase, we want to test two things: (i) How well does QN converge for WI, which combination works best? (ii) Can we recover higher-order in time? We look at both in this order. As most of our conclusions are based on a known exact solution u exact , the numerical methods in the solvers have been chosen accordingly. In particular, the solvers use finite elements of sufficiently high order to be able to solve the problem without spatial discretization error.

Scenario setup
We solve the dimensionless 2D time-dependent heat equation as described in the FEniCS tutorials book. 45 In the following, we extend this example, such that it can be used to assess the quality of the previously discussed algorithms. For details on finite element discretization and solution strategies in FEniCS, we refer the interested reader to the FEniCS tutorials book. 45 The right-hand side f as well as the boundary conditions in Equation (7) are manufactured such that we get the analytical solution u exact (x, y, t) = 1 + g(t)x 2 + 3y 2 + 1.2t.
Using P 2 finite elements and a sufficient approximation order in time allows us to recover u exact and to compute the exact heat flux ⃗ q exact = ⃗ u exact via finite element projection. To use the heat equation as a benchmark for partitioned solution strategies, we divide the domain Ω into two independent subdomains Ω  = [0, 1] × [0, 1] (handled by the Dirichlet solver) and Ω  = [1, 2] × [0, 1] (handled by the Neumann solver) such that Ω = Ω  ∪ Ω  and the coupling boundary Γ = Ω  ∩ Ω  , compare Figure 3. This setup will be referred to as the partitioned heat equation. We use a triangular equidistant mesh with 20 × 20 × 2 triangles in Ω  and Ω  . We extract the temperature u or the normal component of the heat flux q ⟂ on Γ from one participant to be provided for the other as Dirichlet boundary condition u = c  or Neumann boundary condition ⃗ n ⋅ ⃗ u = q ⟂ = c  , respectively. For time stepping, we use IE, TR, or a fourth-order spectral deferred correction (SDC) scheme, 46 depending on the choice of and the goal in terms of approximation order in time. We construct the SDC scheme from an IE time integration scheme with three Gauss-Lobatto collocation nodes at t ini + i t, t ini + (i + 0.5) t and t ini + (i + 1) t, i = 1… n and K correction sweeps per time step of size t.

Quasi-Newton convergence
In Section 2.4, we introduced two variants combining WI coupling with quasi-Newton acceleration, QN-WI and rQN-WI.
In addition, we consider classical underrelaxation for rel-WI and the plain fixed-point iteration full-WI. For rel-WI, we apply an underrelaxation of 0.5, which is optimal for the semidiscrete case. 11 We compare these methods concerning the speed of convergence. As a reference, we compare the convergence of all WI coupling acceleration methods to the convergence of QN-SC for SC(n  , n  ). The latter quantifies the price we pay for higher-order in time in terms of iteration counts per time step. As we are not considering the approximation order of our time stepping scheme in this paragraph, we use IE time integration combined with piecewise linear interpolation for WI in both domains (WI(n  , n  ;1)). Note that for the setups in which WI is used (QN-WI, rQN-WI, rel-WI, full-WI acceleration schemes, we use a QR2 filter 23 for the QR decomposition with limit = 10 −3 and a residual-sum weighting of the individual sub vectors. 29 As testcase, we use g tri (t), since for g pol (t), the g(t)x 2 term quickly dominates the equation leading to a much simpler coupled problem. See Figure 4 for the iteration per time step for QN-WI for different choices of g(t). Only the periodic g tri (t) shows a rather constant behavior of the iterations per time step.
We measure the average number of iterations required to terminate in every window until T = 10 and present the respective results in Table 1. We evaluate the performance for different window sizes Δt = 0.1, 0.2, 0.5, 1.0, 2.0, 5.0 and different multirate settings with n  , n  ∈ {1,3,5}. Our study showed that full-WI, that is, WI without acceleration, does not converge within 100 iterations per time step. Thus, we present only iteration counts for the other four variants. The most important findings are summarized in the following: 1. For rel-WI, the number of coupling iterations is almost independent of the multirate setup. All quasi-Newton variants need two to three times fewer iterations than rel-WI. 2. Overall, QN-WI is more robust, while rQN-WI is more efficient for special cases. In general, rQN-WI seems to have difficulties for setups with many substeps on both sides, while QN-WI can better cope with these setups. rQN-WI and QN-WI perform equally well for WI(n  ,1;1) as both methods are equivalent for these cases. 3. For all variants, the number of iterations slowly decreases with decreasing window size Δt. This effect is observable, but not very significant. Thus, there is potential to make coupling more asynchronous by applying WI on larger windows. 4. In comparison to existing literature, 29 the number of iterations needed in our experiments is relatively high although the application is not particularly challenging. We explain this with the fact that our current WI algorithm does not F I G U R E 5 Heat-heat coupling: visualization of coupling quantities temperature u and heat flux q x and their time derivatives t u and t q x at interface point (x,y) = (1,0.5) over time t (solid lines) together with the respective exact solutions (dotted lines). We show the quantities for three windows of size Δt = 2 and a total simulation time T = 6. We use WI (5 , 3 ; 2) and g=g tri . Note that the coupling quantities are C 0 continuous across coupling windows, but not C 1 continuous, which can be seen from the discontinuity in the first derivative in time at the window boundaries t = 2,4. The quantities u and t u are scaled for better readability yet apply further optimization of quasi-Newton that reduces the number of iteration dramatically in Reference 29. In particular, we do not reuse data from previous time windows. Implementing this reuse for WI is subject to future work. 5. QN-SC performs well and does not suffer from multirate, even though the time steps ( t) are ignored by the acceleration scheme. Note, that QN-SC uses an easier convergence criterion than the other two quasi-Newton variants, since only the last sample is considered at the end of the window and recall that this method does not allow to achieve more than first-order accuracy in time.
From these findings, we conclude that the convergence speed of the quasi-Newton based variants QN-SC, QN-WI, and rQN-SC is relatively similar. The underrelaxation based variant rel-WI performs significantly worse. In the remainder of this article, we will therefore focus on the quasi-Newton based variants.

Approximation order in time
Above, we have examined the convergence speed of different iterative coupling algorithms and concluded that only quasi-Newton accelerated algorithms are able to reach a competitive number of iterations. In the following, we concentrate on the accuracy of the quasi-Newton based multirate setups WI(n  , n  ;p) and SC(n  , n  ). Note that we do not apply any kind of additional conditions to enforce continuity across windows (see Figure 5). In a first step, we use g pol (t) and study for which setups and p we cannot observe any discretization error in the solution u with respect to the exact solution u exact . As stated in Section 2.2, a time stepping scheme of at least order p and waveform interpolation of at least degree p are required to achieve this. We consider the numerical solution u of the partitioned heat equation to be equal to the exact solution u exact and thus free of any discretization error in time and space, if the L 2 (Ω) error is smaller than 10 −12 in every time step. To avoid errors being introduced by the iterative coupling procedure, we use a very strict relative coupling tolerance of 10 −12 . As explained above, our discretization uses piecewise quadratic finite elements for the computation of u to be able to represent u exact without discretization error in space.
Using QN-SC for SC(n  ,n  ), we observed that only for the trivial nonmultirate cases with n  , n  = 1 using IE and = 1 or TR and = 2, where no interpolation is necessary, the exact solution can be recovered. The only multirate settings, where the exact solution could be obtained are the edge cases SC(1,n  ), n  ∈{1,2,3,5} with = 1 and IE. Here, the starting Dirichlet solver only requiring data at the end of the time window computes the correct flux for the end of the Neumann window, which is the only place where we measure the error. We expect the single time steps on Neumann side to not give the exact solution. For other multirate setups (such as SC(5,2)), we could not obtain the exact solution, which was to be expected.
In a second step, we determine the convergence order of time stepping for the setup using g tri . For the convergence study, we determine the L 2 (Ω) error of the numerical solution u with respect to the exact solution u exact . We consider the error at final simulation time T = 1 and decrease the window size Δt. We use a moderate coupling tolerance of 10 −5 . Figure 6 shows a convergence study for WI(5,3;p) and SC(5,3). Note, that we intentionally pick a setup where interpolated values are required by both participants of the coupled simulation at every time step. If WI(5,3;p) is used for coupling in combination with a time stepping method of order ≥p, we expect convergence order p as detailed in Section 2.2 . We observe this for IE and TR, combined with linear and quadratic interpolation, respectively. For the fourth-order SDC and cubic interpolation, we observe a convergence order of approximately 3.5. In addition, we observe that the accuracy of SDC is bounded by the coupling tolerance for small Δt and the error does not scale with the time step anymore. SC(5,3), on the other hand, gives slow to no convergence and significantly higher errors for IE and TR time integration methods.
To investigate the effect of multirate schemes being used, we perform another convergence study, where TR with WI(2,2;p), WI(5,2;p), WI(2,5;p), and WI(5,5;p) is used (see Figure 7). The degree of the interpolating polynomials is irrelevant for cases, where no interpolation is needed, that is, WI(2,2;p) and WI(5,5;p) reach second-order regardless of the interpolation scheme being used. Note that this effect is only expected to occur for time stepping schemes such as TR or IE, where no additional function evaluations in time are required. For SDC, we did not observe this effect. For WI(5,2;p) and WI(2,5;p), piecewise quadratic interpolation (p = 2) allows us to reach second order of the time stepping scheme, as expected. Here, the benefit of multirate is clearly visible: WI(2,5;2) shows a significantly lower error than WI(5,2;2) and almost the same as WI(5,5;2), which means that higher resolution in time for  is more beneficial than for . Presumably due to the term x 2 sin(t) in u exact , which is significantly larger in Ω  .
If only piecewise linear interpolation (p = 1) is applied, second order can be reached for certain cases (eg, WI(2,5;1)). This might be due to the smaller time steps on the Neumann side, from which the Dirichlet solver reads the interpolated data. In other words, it can be assumed that what we observe is the convergence of the dominating spatial error of the Dirichlet solver (using a 2.5 times larger time step). WI(5,2;1) shows the expected deterioration of the order below second order. We observe that, even though reaching second-order convergence, the accuracy of WI(2,5;1) is significantly lower than for the equivalent case with higher-order interpolation WI(2,5;1).
To summarize, we conclude that the converged solutions of QN-WI for WI(n  ,n  ;p) fulfill the expectations in terms of accuracy. SC(n,n), on the other hand, yields only first-order accuracy. We observe that interpolation plays an important role for WI. If we want to avoid order degradation, we have to use an interpolation with a sufficiently high polynomial degree. The required degree depends on the order of the time integration method, usually we require the polynomial F I G U R E 7 Heat-heat coupling: Convergence of the L 2 (Ω)-error at T = 1 for the trapezoidal rule as time integration method on both sides, QN-WI, and various multirate setups and interpolation methods degree p to be at least equal to the order of the time stepping scheme. However, we notice that, for exceptional cases, p lower than the order of the time stepping scheme can be sufficient.

Fluid-structure interaction
We now study whether we can generalize our observations from the relatively simple heat transfer problem to a more involved fluid-structure interaction problem. We show that QN-WI coupling allows us to couple two different second-order time integration methods, a TR used in the fluid solver and a Newmark-method used in the solid solver to an overall second-order coupled simulation even when combining different time step sizes in a multirate setup.
As FSI scenario, we study the Perpendicular Flap testcase from the preCICE tutorials § § . An elastic flap with a width of 0.1 m and a height of 1.0 m is mounted in crossflow at the center of a channel with a length of 6.0 m and a height of 4.0 m. Figure 8 left shows the setup at maximal deflection. This testcase is not experimentally validated, but cross-validated with different solver combinations. Compared with the preCICE tutorial, we slightly adjust the physical parameters: we use a higher fluid viscosity and a stiffer solid material to avoid requiring a finite element stabilization in the fluid solver and to use an only linear model in the structure solver, respectively. We set the fluid density to F = 1.0 kg∕m 3 , the kinematic viscosity to F = 1.0 m 2 ∕s, the solid density to S = 3.0 ⋅ 10 3 kg∕m 3 , the Young's modulus to E = 4.0⋅10 7 kg/ms 2 and the Poisson ratio to = 0.3. On the left boundary a constant inflow profile in x-direction of 10 m/s is prescribed. The right boundary is an outflow and the top and bottom of the channel as well as the surface of the flap are no-slip walls.
In the following, we shortly describe the solid and the fluid solver we use. Note that most details, in particular the spatial discretization, can be chosen arbitrarily for the purpose of this article. Thus, we only mention the respective (state-of-the-art) methods and put more emphasis on additional features that must be offered by a solver to allow higher-order time stepping in the coupled simulation. This, in particular, concerns the fluid solver: Whereas a stand-alone second-order fluid solver only needs to provide velocity and pressure with second-order accuracy in time, a fluid solver in a coupled simulation also needs to provide mesh displacements and forces with suitable accuracy.
For the solid domain, we use the linear elasticity solver developed in FEniCS and validated against CalculiX as part of the Bachelor thesis of Hertrich, 47 to which we also refer for details. We use a structured mesh with 5 × 50 cells, where each cell is cut into two triangles and P 2 finite elements for discretization. For time integration, as mentioned previously, we use a second-order Newmark-method. The resulting linear system is solved by a direct solver.
For the fluid domain, we implemented a new incompressible Navier-Stokes solver in Nutils specifically for this testcase. We use a structured adaptive mesh as depicted in Figure 8 left. We formulate the fluid equations fully coupled and fully nonlinearly in an arbitrary-Lagrangian-Eulerian framework and discretize with Q 2 − Q 1 Taylor-Hood elements. The nonlinearity is resolved by Newton's method with line-search up to an absolute residual of 10 −9 . A direct solver is used for the resulting linear systems. The mesh displacement is modeled by a simple Laplace equation, discretized by Q 1 elements and again solved with a direct solver. For time integration, we make use of the second-order TR, meaning we replace the time derivative of the velocity by the finite difference (u i+1 − u i )∕ t and use a linear average between the variables of the current and the previous time step for the diffusive and the convective term. To avoid pressure oscillations, however, we take the pressure gradient and the continuity term only at the new time level.
To get to second order in a coupled FSI setup, a few enhancements of the fluid time integration are necessary, as mentioned above. First, a second-order backward difference scheme is needed to compute the mesh velocity from the mesh displacement, where u mesh is the mesh velocity, d the mesh displacement, and the index i the time step. Second, the computation of forces acting on the solid needs to be constructed carefully. We compute fluxes over the fluid boundaries by evaluating the residual of the weak formulation at the current solution and extracting the coupling boundary. 48 Afterward, we compute forces from the fluxes. To get second-order-accurate forces, we need to adapt the temporal discretization of two terms of the weak form compared with the trapezoidal form explained above. As time derivative of the velocity, we here use a second-order stencil, where u is the velocity. Furthermore, the TR with a fully implicit pressure gradient as explained above only results in second-order pressure values at the midpoint p i+0.5 . Using these values at the time level i + 1, as often done, is, however, only first-order accurate. To get second-order-accurate values at this time level, we extrapolate in the pressure gradient term.
As the fluid solver and the solid solver use nonmatching meshes at the coupling boundary, we require data mapping methods between both meshes. We, therefore, use radial-basis function interpolation with thin-plates splines as provided by preCICE. 49 In the fluid solver (Nutils), we directly read the displacement values at the Gauss points. FEniCS, however, requires a functional description of boundary conditions. We, therefore, read force values at the mesh vertices and construct the boundary function again by a radial-basis function interpolation; this time in the FEniCS adapter. 47 To validate the new fluid solver, its mesh movement, and its force computation, we compare the Nutils-FEniCS results against OpenFOAM-FEniCS results from the preCICE tutorials. For both simulations, we use Δt = 0.01 s and QN-SC(1,1) as coupling algorithm. Figure 8 shows a close agreement for the x-displacement of the tip until T = 5.0 s for an initial fluid at rest.

Convergence study
To study the temporal convergence order of a coupled simulation, we compute the Perpendicular Flap testcase until T = 0.01 s with decreasing window sizes Δt l = 0.0025 ⋅ 2 −l s for time levels l = 0,1,… ,4. For all levels, we compute QN-WI with WI(1,1;1), WI(2,3;2), and WI(3,2;2). As reference solution for all methods, we use WI(1,1;1) with time level l = 6. A crucial ingredient to see second order are suitable initial conditions. We start the fluid solver from an initial Stokes solution and then scale the forces on the structure until T/2 with a C 1 -continuous sin 2 ( (t + t)∕T) ramp. We measure , where x denotes the coordinates at the coupling interface and m the number of vertices. Thus, d(x ) are nodal evaluations. Figure 9 compares the errors for all time levels. QNWI coupling achieves a clear second-order convergence for all three considered multirate setups. The errors of WI(1,1;1) appear to be nearly the same as those of WI(3,2;2), but for two times smaller time window sizes, which indicates that the solid time step size dominates the overall error. This again shows the great potential of multirate time stepping as the fluid problem is the far more expensive one. Thus, using a smaller time step size for the structure solver is almost for free. We compute the coupling iterations up to a relative residual error of 10 −6 and list the average number of iterations in Table 2. We can observe a slight increase of required iterations with decreasing window size, contrary to our observations for the heat transfer testcase in the last section. This does not come as a surprise as, for incompressible FSI and vanishing structural mass, a similar behavior is reported in literature. 39 It opens, however, the possibility for optimization as increasing the window size while keeping the time step size constant might lead to a more efficient numerical method, besides also having all the computing advantages of increased asynchronicity (such as reduced communication and better balancing of varying computational load). A detailed study is, however, beyond the scope of this article. Finally, comparing the required iterations of WI(2,3;2) with WI(3,2;2) could indicate that the structural time step size is the critical part.

CONCLUSIONS AND OUTLOOK
This contribution shows that using WI in a black-box approach for partitioned multiphysics simulations is possible. Black-box means here in particular that the required interpolation in time can be computed without using discretization details of the coupled codes. An elegant prototype software design using a middle-layer called waveform bindings allows us to test arbitrary multirate settings with the coupling library preCICE without any alterations to the latter. Arbitrary multirate settings means that we allow any (and different) numbers of time steps in the involved solvers within a so-called coupling window. Our results show that (i) the implementation actually achieves the predicted order of consistency in time and that (ii) we can generalize interface quasi-Newton methods to the space-time domain. For more complex scenarios such as fluid-structure interaction, we observed that actually achieving higher-order requires careful consideration of all solver components, of mesh moving, and of evaluations of coupling variables.
Our current implementation has a couple of limitations, which we intend to tackle in future work: (i) We have to move the functionality of the waveform bindings to preCICE and decide how to do this while maintaining backward compatibility of the application user interface as far as possible. This also removes the technical restriction to Python. (ii) Features that optimize the performance of quasi-Newton schemes need to be provided also for WIs. This holds in particular for reuse of previous time step data and improved initial guesses for the initial approximation (currently constant extrapolation) of interface data before the first iteration in a new time window. (iii) Our methods should be generalized to adaptive time stepping, which is less of a mathematically unsolved problem, but a software issue as it might require varying numbers of time steps and, thus, varying numbers of coupling data structures in preCICE from one time window to the next. In addition to these enhancements, further studies for arbitrary high order (as in Reference 50) will be necessary.