Quasi-Newton Waveform Iteration for Partitioned Fluid-Structure Interaction

We present novel coupling schemes for partitioned multi-physics simulation that combine four important aspects for strongly coupled problems: implicit coupling per time step, fast and robust acceleration of the corresponding iterative coupling, support for multi-rate time stepping, and higher-order convergence in time. To achieve this, we combine waveform relaxation -- a known method to achieve higher order in applications with split time stepping based on continuous representations of coupling variables in time -- with interface quasi-Newton coupling, which has been developed throughout the last decade and is generally accepted as a very robust iterative coupling method even for gluing together black-box simulation codes. We show convergence results (in terms of convergence of the iterative solver and in terms of approximation order in time) for two academic test cases -- a heat transfer scenario and a fluid-structure interaction simulation. We show that we achieve the expected approximation order and that our iterative method is competitive in terms of iteration counts with those designed for simpler first-order-in-time coupling.


Introduction
We consider time dependent fluid-structure interaction (FSI) problems. Examples are numerous: flutter of airplanes or wind turbines, blood flow in arteries, gas quenching or cooling of rocket engines, to only name a few. We follow a partitioned coupling approach, which abides the natural wish to re-use established single-physics solvers. Hereby, our goal is to have highest flexibility, while still having a fast time to solution in a black-box coupling framework. Only minimal information from the solvers is used in that only boundary conditions at the interface are exchanged, as well as geometric interface information, that can be used to map non-matching interface meshes.
In this article, we focus on the time integration. The combination of different physics oftentimes implies different time scales. Thus, we would like to be able to use different time steps in the involved solvers. Additionally, higher-order time integration is required to efficiently produce sufficiently accurate results. An open problem is to design stable, efficient and high-order methods that allow for different time steps in the separate solvers [1] and for the use of blackbox solvers. A class of methods that potentially give these properties are waveform relaxations, also called waveform or dynamic iterations [2,3]. These integrate over a time window, allowing the subsolvers to use potentially arbitrary methods and time steps in that window, in the spirit of the partitioned approach. To this end, continuous interpolations in time are employed to provide solution values to the other problem. High order can be obtained by sufficiently high Another idea that is more straightforward to use is to use a quasi-Newton method instead of relaxation to improve the convergence of waveform iteration. This has been very successfully applied for non-waveform partitioned black-box FSI, and, due to its efficiency, is now state of the art in this field. The basic variant is called Interface Quasi-Newton Inverse Least-Squares (IQN-ILS) [15,16]. As a non-linear solver, the method is also known as Anderson acceleration [17,18]. Important recent improvements have been parallel execution of solvers [19], filtering [20], reducing tuning parameters [21,22,23] and the generalization to multiple coupled solvers [24]. The resulting methods lead to very few coupling iterations, are robust without tuning parameters, and have linear compute and memory complexity in terms of interface degrees of freedom due to matrix-free formulations and due to a mesh independent convergence rate. Implicit coupling together with state-of-the-art IQN methods overcome added mass instability and convergence issues and can therefore also be applied for bio-mechanical applications [25,26].
In this paper, we combine waveform iteration (WI) with IQN, with the goal of getting a novel fast partitioned blackbox coupling method that still has high order and allows for different time step sizes in different solvers. To do this, we consider versions of the quasi-Newton method that differ in terms of which subsets of time step data in the so-called coupling windows are used to approximate the system Jacobian. A similar approach has been followed in [27] for a less general setting. There, subcycling, i.e., using several time steps of one solver within one time step of the other one, is considered. For a 1D problem and certain time step ratios, the stability of the time integration is analyzed, which is confirmed by numerical results in 2D. Our approach allows different numbers of time steps of both solvers within a so-called coupling window such that the time steps of involved solvers no longer have to be multiples of each other (called multi-rate time stepping).
For the description and the numerical experiments, we only consider so called Dirichlet-Neumann type coupling methods. These are a basic building block in FSI that only use boundary conditions that are available in the subsolvers anyhow. To this end, Dirichlet, respectively Neumann data at the interface are exchanged between the solvers. A generalization for other coupling conditions is possible, but requires additional numerical testcases to evaluate the resulting performance. We do not adapt time steps dynamically in time, but we would like to point out that time adaptivity is straightforward to use in waveform iterations. This is important, since it can give dramatic performance improvements [28,29].
We demonstrate up to third order in time for the combined method. The iteration numbers do not vary much when varying the time step ratios and are rather small. We test the proposed algortihms by using the coupling library preCICE [30] with the finite element frameworks FEniCS [31] and Nutils [32]. We, therefore, use IQN from preCICE and implement WI in a new middle layer between preCICE and the coupled solvers.
The structure of the article is as follows: In Sect. 2, we explain the mathematical framework for both concepts -waveform iteration for multi-rate coupled time stepping on the one hand and quasi-Newton on the other hand -separately and then show how we can combine them. We then describe in Sect. 3 how we realized a prototype implementation in a middle software layer between coupled solvers and preCICE. In Sect. 4, we show numerical results for a coupled heat equation and an FSI scenario, both in 2D. We conclude in Sect. 5.

Coupling Algorithms
Before introducing the implementation in the coupling library preCICE, we present the details of our coupling approach allowing for multi-rate time stepping, higher-order accuracy in time, and fast iterative convergence in several steps: Section 2.1 introduces our notation and the general form of the coupling formulation as used in standard loworder approaches. Section 2.2 explains how to extend this basic coupling with a waveform approach, which includes information from individual time steps to increase the approximation order. Section 2.3 explains how to construct a quasi-Newton scheme for a general setting. Section 2.4 combines both concepts to define quasi-Newton variants for multi-rate higher-order coupling.

Basic Notation and Multi-rate Coupling via Single Value Coupling
We consider a coupled problem on two non-overlapping domains Ω D and Ω N . We use a partitioned approach, i.e., we rely on stand-alone solvers for each domain, which we regard as black boxes providing access only to input and output data, but not to discretization details. The coupling of solvers happens at the coupling interface Γ = Ω D ∩ Ω N . We couple the solvers via Dirichlet-Neumann coupling. The Dirichlet solver on Ω D takes a Dirichlet boundary condition at the coupling interface Γ from the Neumann solver on Ω N , while the Neumann solver takes a Neumann boundary condition at the coupling interface Γ from the Dirichlet solver. For FSI, the natural choice is to use the fluid solver as Dirichlet solver with Dirchlet boundary values being velocities or displacements, and the solid solver as Neumann solver with Neumann boundary values being forces or stresses (see Sect. 4.2). For other coupled problems such as heat transfer, which we use as a second application test case in Sect. 4.1, this choice has to be made based on material parameters of the two media [33]. Here, Dirichlet boundary values represent temperature values at the coupling interface and Neumann boundary values the heat flux computed from the internal temperature field.
Both solvers perform time stepping (explicit or implicit) using their respective schemes in a given time window [t ini ; t ini + ∆t] with initial time t ini and window size ∆t. Within each window, we assume them to use a fixed, but possibly different time step size. The Dirichlet and the Neumann solver, thus, perform n D or n N time steps of size δt D and δt N , respectively (∆t = n D δt D = n N δt N ).
We now define the operators D SC and N SC as the action of our black-box solvers over the complete window, given Dirichlet boundary values c D end at the end of the coupling window (and initial values at the boundary and inside the domain, which we omit as arguments for better readability), the Dirichlet solver returns Neumann boundary values c N end at the end of the coupling window. The latter are used as an input for the Neumann solver that returns Dirichlet boundary values c D end , again: The coupling condition at the interface can now be expressed by means of the fixed-point equation acting only on the Dirichlet interface data c D end at the end of the window as the two codes exchange only bounday data at this single point in time. H SC denotes the corresponding fixed-point operator. We refer to coupling methods solving Eq. (1) in this contribution as single value coupling and denote them as SC(n D , n N ). Further, the execution of n D and n N time steps in the solvers before coupling at the end of time window is called multi-rate time stepping. the special case with either n D = 1, n N = 1 or n D = 1, n N = 1 is commonly called subcycling [34,35].
Eq. (1) entails a Gauss-Seidel-type coupling, executing D SC and N SC one after the other, while the roles of D SC and N SC could also be swapped leading then to a fixed-point equation in terms of c N . Furthermore, we could instead also consider a Jacobi-type coupling [19]. For the sake of clarity, we omit both additional variants in this contribution. To achieve a stable coupling also for strongly coupled problems and to increase accuracy, we consider an implicit (or strong) coupling, i.e., we repeat D SC and N SC multiple times per time window until convergence of c D end . This corresponds to a fixed-point iteration for Eq. (1). If we execute both solvers only once per time step, this corresponds to classical Godunov splitting. It always yields only first-order in time, even if we apply implicit coupling iterating Eq. (1) until convergence. We can summarize multi-rate single value coupling in an algorithmic setting as 1. Initialize c D 0 with the value c D end from the previous time step.
2. Solve the Dirichlet problem (with n D time steps): 3. Solve the Neumann problem (with n N time steps): 4. If c D k+1 − c D k is small enough: Go to the next time window.
5. Else: k = k + 1 and go back to item 2.
Note that, in general, this iteration does not converge and requires an additional acceleration step such as underrelaxation or quasi-Newton in item 5 in the algortihm above. We describe details for this acceleration in Sect. 2.3.

Extension to Waveform Iteration
Waveform iterations (WI) are an elegant way to increase the order with minimal use of the solvers' internal details, i.e., well suited for black-box coupling. In contrast to the method from the previous section, WI exchange more data than just the solutions at the end points.
The basic idea is to use a time-continuous representation of c D instead of only the value c D end at the end of the coupling window to connect the two solvers. We can construct representations of the continuous functions c D and c N from all time steps in the window [t ini ; t ini + ∆t] by interpolation based on a finite dimensional basis. We, thus, define c D and c N as functions in time: If both solvers execute independent time steps within the time window as introduced above, they can now evaluate the continuous representations c D and c N as boundary conditions wherever necessary. If the Dirichlet solver, for example, executes three time steps using the trapezoidal rule and δt D = 1 3 ∆t, it samples c D at four points in time (t ini , t ini + 1 3 ∆t, t ini + 2 3 ∆t, t ini + ∆t). If the Neumann solver uses two implicit Euler time steps of size δt N = 1 2 ∆t, it evaluates c N at two points in time (t ini + 1 2 ∆t and t ini + ∆t). If one of the solvers uses higher-order time integration (such as a Runge-Kutta RK4 scheme), necessary substeps can now also be retrieved from c D or c N . Giving continuous boundary value representations to our solvers yields new operators These new operators are composed of two parts, the actual solver actions producing time-discrete data at the end of each of their n D or n N time steps from a time-continuous boundary condition and interpolation operators * I p,c0 : (c 1 , . . . , c n ) → c generating piecewise polynomial interpolation of degree p from n time-discrete points and an 'initial value' c 0 at t ini (i.e., a total of n = n + 1 datapoints). Here, we use an interpolating B-Spline curve with n + p + 1 knots [36] † : As we only use n + 1 data points in time and, in particular, neither use values from previous windows nor derivatives in time for interpolation, we have the restriction p ≤ n for I p,c0 . We use piecewise linear (p = 1), quadratic (p = 2), or cubic (p = 3) interpolation. Piecewise linear interpolation requires two data points, i.e., n D ≥ 1, n N ≥ 1, and is, thus, always possible, piecewise quadratic requires at least three samples, i.e., n D ≥ 2, n N ≥ 2, piecewise cubic at least four samples, i.e., n D ≥ 3, n N ≥ 3.
The definition of the new operators D WI and N WI yields a new fixed-point equation operating on functions in time or, expressed in terms of the time-discrete steps We denote coupling methods solving Eq. (2) or Eq. (3) as waveform iteration WI(n D , n N ; p) and the corresponding fixed-point operator as H p n D ,n N . Figure 1 compares SC(2, 5) and WI(2, 5; 2) in a schematic way. In an algorithmic setting, we can summarize waveform iteration as 6. If c D k+1 (t) − c D k (t) small enough: Go to the next window.
7. Else: k = k + 1 and go back to item 2.
Note that, in the general case, interpolation of different degrees p D and p N may be used for the two problems. For the sake of simplicity, we restrict ourselves to the case where p D = p N = p. Figure 1: Comparison between single value coupling (SC (2,5), left) and waveform iteration (WI(2, 5; 2), right). Exemplary sample values are depicted for t ini + δt N and t ini + δt D . Note that SC(2, 5) is not equivalent to WI(2, 5; 0), i.e., waveform iteration with constant interpolation. This is due to the fact that single value coupling does not use piecewise constant interpolation, but globally constant interpolation over the whole time window.
At this point, we would like to add some comments on which combinations of interpolation methods and time stepping schemes have to be used to (i) reproduce the solution of a problem with known polynomial solution of degree p in time exactly and to (ii) achieve a desired approximation order p in time in general: For (i), we have to use piecewise polynomial interpolation of at least degree p to represent this solution exactly at the interface. For time integration, a scheme of at least order p has to be used.
For (ii), we have to use a pth-order polynomial interpolation, since we can at most afford an error of O(∆t p+1 ) per time window, which accumulates to an error of O(∆t p ) over O(∆t −1 ) time steps. For time integration, again a scheme of at least order p has to be used.
Accuracy is in a way limited by the number of time steps n that defines the maximal polynomial degree p of the interpolation scheme. In future work, we intend to remove this restriction by using data from previous windows.
Also this iteration needs acceleration methods that we describe in a general formulation in Sect. 2.3 and more specifically for waveform iteration in Sect. 2.4.

Interface Quasi-Newton Methods
After having introduced two specific fixed-point operators H SC and H p n D ,n N in the previous two sections, we now consider a general fixed-point equation and the associated residual, and explain how we can formulate an accelerated iteration with A denoting the acceleration operator. We stop the iteration, when the norm of the coupling residual R(x k ) is small enough. If A is the identity, we get a classical fixed-point iteration (Picard iteration), which only converges if H is a contraction. For FSI, this is typically not true [37]. In early FSI days, the acceleration operator A has been realized as underrelaxation, e.g., based on a dynamic Aitken scheme [38]. A better approach is to reuse past iterates to build up an approximate Jacobian of R and use this in quasi-Newton iterations [15,16]. To avoid linear dependencies between information from previous iterations, we have to perform a modified Newton iteration starting from the result of the pure fixed-point iteration (for details, see [26]): as an approximation of the inverse of the Jacobian ofR. To compute JR(x k ), we collect input-output information from past iterates * of H in tall and skinny matrices V k , W k ∈ R m×k , m k, For transient coupled problems, we need to solve a fixed-point equation per time window. In this case, it is beneficial for efficiency to also include iterates from past windows in V k and W k [26]. This variant is however beyond the scope of this paper and subject to future work. The matrices V k and W k define the multi-secant equations for the inverse Jacobian To get the classical IQN-ILS [15], we close Eq. (4) by In practice, the Jacobian is not stored, but calculated in matrix-free form from via a QR-decomposition of V k ,

Quasi-Newton Waveform Iteration
Now we combine the two ideas we discussed above: (i) We have used a waveform iteration approach dealing with multi-rate settings, i.e., different numbers n D and n D of time steps of size δt D and δt N within a window of size ∆t in the involved solvers, such that ∆t = n N δt N = n D δt D , and, at the same time, maintaining higher order. (ii) We have presented the concept of quasi-Newton methods as an acceleration approach for the resulting interface fixed-point equations. Now, we show how we can combine the two parts towards a robust, efficient, and accurate coupling for partitioned simulations with black-box solvers.
The missing piece here is the answer to the question which residual components we should consider to build our quasi-Newton scheme. For a multi-rate setting with given n D and n N based on the fixed-point equation Eq. (1), the choice is canonical, i.e., we use x k := c D k end . If we have m D degrees of freedom at the interface, c D end ∈ R m D , we get V k , W k ∈ R D × . We call this variant QN-SC. * We use a simple underrelaxation for the first iteration.
For waveform iteration, we have formulated our Dirichlet-Neumann coupling in terms of the fixed-point equation Eq. (3). We have all n D time steps of c D included in the fixed-point equation Eq. (3) with the respective fixed point operator H p n D ,n N . The canonical quasi-Newton variant for this system concatenates all time steps' data in a large vector for both input and output of the operatorR. Thus, we get matrices V k and W k in R (m D ·n D )×k . We call this variant QN-WI. This method is computationally more expensive as we have to deal with a substantially larger interface system than for single value coupling. Typically, the size of the interface system has, however, a negligible influence on the overall performance [26].
A variant to reduce the size of multi-secant system Eq. (4) is to measure the residuals not at all samples, but only a subset. To test such a setting in the following, we only consider one specific case: we only use the last sample c D n D , i.e., In this case, the approximate matrix J −1 R is no longer a square matrix but in R (m D ·n D )×m D . However, we do not explicitly calculate J −1 R , but solve a least squares problem as in Eq. (5) with α ∈ R k as before. We call this variant reduced quasi-Newton rQN-WI.
Remark 1: If we do not include all time steps in the input vector x and, thus, in the matrix W k , but simply restrict the fixed-point iteration formulation entirely to the last step n D , we would also only update the entry c D n D and, thus, not generate an appropriate update of the whole waveform polynomial representation of c D .

Remark 2:
The method of De Moerlosse et al. [27] corresponds to WI(1, n; 1) or WI(n, 1; 1), i.e., one solver executes n time steps δt within one window, while the time step of the other solver is equal to the window size ∆t. Linear interpolation is used to transfer information from the larger time step (∆t) solver to the other solver (δt = ∆t/n). In this case, QN-WI and rQN-WI are equivalent. In our general setting, we impose no restrictions on time steps of the involved solvers except that they 'meet' at a common time at the end of the window, i.e., the window size ∆t is a multiple of the solver's time steps δt. Therefore, our generalization also allows to naturally handle more than two coupled solvers.
Remark 3: In literature, waveform iteration, i.e., H p n D ,n N , is usually combined with (constant) underrelaxation. We compare our quasi-Newton variants to this standard technique in Sect. 4 and call the latter rel-WI.
Summarizing, we can state that QN-WI and rQN-WI can be considered as quasi-Newton variants in space-time, whereas QN-SC only acts in space.

Software
We now describe how we realize the aforementioned multi-rate coupling algorithms in software. In fact, the bigger picture of our research is to implement an elaborate and sophisticated treatment of the temporal dimension into the widespread open-source coupling software preCICE [30]. This manuscript is an important milestones for this endeavour. preCICE is a library that allows to couple existing (legacy) simulation codes to complete multiphysics simulations in a minimally invasive way. preCICE is written in C++, but also offers bindings for C, Fortran and Python. For many community codes (such as OpenFOAM, CalculiX or SU2), ready-to-use adapters are provided [39]. The setup of coupled codes and of the data to be exchanged is configured at run-time through a global XML file. Various quasi-Newton algorithms are already implemented in preCICE including IQN-ILS [15] as introduced in Sect. 2.3.
Whereas single value coupling, as introduced in Sect. 2.1, is based on exchanging the values of the coupling variables at window end points, waveform iteration, as introduced in Sect. 2.2 needs time-continuous representation of the coupling variables over complete windows. This important mathematical difference is also reflected in the software requirements. preCICE currently only offers interface methods such as data = read data(...) and write data(data,...) to read and write values at a single point in time, which can be interpreted in most cases as the window end t ini + ∆t. Afterwards, the interface method advance() triggers the exchange of these values between both solvers and the computation of the quasi-Newton acceleration. This set of methods provides everything needed to realize single value coupling, which, in fact, is already fully supported [40] by the preCICE version we use in this contribution, v1.6.1 . Waveform iteration, however, requires some notion of time in the application programming interface (API) of preCICE, which would make a fundamental redesign of the coupling logic necessary.
Waveform Bindings As the choice of the most suitable waveform iteration algorithm for preCICE is still an open research question, we instead aim for a prototype software design, which allows to easily test various multi-rate Solver libprecice: u = read vector data("Forces1") waveform-bindings: u = read vector data("Forces", time) solver adapter approaches without requiring immature refactoring of preCICE. A mature implementation of waveform iteration in preCICE is subject to future work. We, therefore, decided that the prototype should implement waveform iteration in a new middle layer called waveform bindings, sitting between the adapter code and the preCICE library. Whereas preCICE does the communication and the quasi-Newton acceleration of the coupling data, the waveform bindings handle the time-continuous representation. With this solution, we do not require any alterations to preCICE itself. Figure 2 sketches the layered architecture.
The essential trick to realize this separation of concerns is to treat a complete time window like a single time step in preCICE. To this end, the data fields of preCICE need to be altered: instead of one data field, such as Displacements, we now have one data field per time step of a window, Displacements1, Displacements2, and so forth. preCICE can be configured to handle and communicate multiple fields, but has no notion of time for these fields, they are just several data fields for a single preCICE time step. The waveform bindings, on the other hand, have this notion of time: Displacements1 are the values after the first time step within a window, Displacements2 after the second time step, and so forth. The API of the waveform bindings now offers methods such as write data(data,..., time) to write timestamped data at a specific time step t ∈ {t ini + δt, t ini + 2δt, . . . , t ini + ∆t} and data = read data(..., time) to read data at an arbitrary t ∈ [t ini , t ini +∆t]. If the latter does not coincide with a given time step, an interpolated value is returned. We use the SciPy package * to compute the interpolant, a B-Spline of degree one to four using the SciPy function interpolate.splrep. The waveform bindings also offer a function advance() which in turn calls the preCICE advance() upon completion of a window to communicate the discrete values (such as Displacements1, Displacements2, . . . ) between both solvers. In the preCICE configuration, the quasi-Newton acceleration can now be configured to be computed on the full set of data fields or only a reduced subset as explained in Sect. 2.4.

Application Programming Interface
To only require minimal modification to existing adapters and testcases, the main API design goal of the waveform bindings † is to mimic the preCICE API as closely as possible. In fact, solely the data access functions are altered as explained above to include time as an additional input argument. Therefore, rewriting an existing coupled code to use the waveform bindings instead of preCICE directly requires modifying less than ten lines of code, which makes alongside testing of single value coupling and waveform iteration easy.
Listing 1 exemplarily shows how the waveform bindings are used to couple a structure solver. The code is nearly identical to a coupled code directly using the preCICE API. In the following, we explain the slight differences. For a detailed explanation of the preCICE API itself, we refer to the respective documentation ‡ . The obvious change in line 1 is to import waveform bindings instead of precice. Afterwards, the waveform bindings need additional configuration in lines 11 to 13. The number of time steps per window on both sides as well as the interpolation strategy need to be defined. For the data access in lines 23 and 31, the preCICE API is extended as mentioned previously. In the example, the structure solver only samples force values at the end of each time step t + δt in line 23, as we use an 1 import waveform_bindings as precice 2 3 solver_name = "StructureSolver" # Structure solver is the Neumann solver N 4 interface = precice.Interface(solver_name) # create handle to waveform bindings API 5 interface.configure("precice-config.xml") # usual preCICE configuration Limitations The layered architecture, however, also comes with technical shortcomings. Firstly, as the waveform bindings build upon the Python bindings of preCICE, currently only codes in Python can be coupled. Besides this language restriction, the software concept comes with a few additional limitations as expected for a prototype. For quasi-Newton acceleration, preCICE allows to reuse iterations from previous time steps to achieve faster convergence. As preCICE has no notion of time for the samples Displacements1, Displacements2, . . . , the reuse strategy would give wrong values. We, therefore, simply switch off this functionality for all experiments in the next section. For the same reason, we cannot use underrelaxation in preCICE, neither directly nor as starting value for the quasi-Newton methods. Otherwise the first relaxed iteration would be a weighted combination between the current solution and the waveform of the last window. Waveform relaxation rel-WI as introduced in Sect. 2.4, however, needs a weighted combination between the current solution and a constant waveform extrapolated from the end value of the previous window. As a remedy, we realized underrelaxation directly in the advance() function of waveform bindings and switched it off in preCICE. Last, each coupled solver needs to know the number of time steps per window for both sides at configuration, see Listing 1, lines 11-13, contradicting the flexibility argument of black-box coupling. We aim to eliminate all these limitation with a mature implementation of waveform iteration within preCICE in future work. Within preCICE in particular means to maintaining backwards compatibility. To which extent this is possible is an open research question.

Numerical Results
We consider two test cases to evaluate the performance of the proposed algorithms both in terms of convergence properties of the quasi-Newton iterations and in terms of the achieved approximation order in time: (i) a simple 2D heat equation (diffusion equation) with known analytic solution, where the computational domain has been divided artificially into two partitions, (ii) a fluid-structure interaction scenario, where a solid truss is deformed by the surrounding fluid flow. Both examples are discretized in space using finite elements. To comply with the restriction of the waveform bindings to Python, we use the finite element frameworks FEniCS * [31,41] and Nutils † [32]. We choose two different codes to show-case the black-box methodology, for which testcases and adapters were already available. For interfacing between FEniCS and preCICE, we make use of the FEniCS adapter ‡ , which converts between FEniCS and preCICE data structures and realizes the rewinding of time steps for implicit (or strong) coupling. Nutils, on the other hand, does not use an adapter, but directly accesses preCICE or the waveform bindings. Code and instructions for running the examples can be found on https://gitlab.lrz.de/precice/ijnme2019-experiments.

Conjugate Heat Transfer
With this academic test case, we want to test two things: (i) How well does QN converge for WI, which combination works best? (ii) Can we recover higher order in time? We look at both in this order. As most of our conclusions are based on a known exact solution, the numerical methods in the solvers have been chosen accordingly. In particular, the solvers use finite elements of sufficiently high order to be able to solve the problem without spatial discretization error. as described in the FEniCS tutorials book [42]. In the following, we extend this example, such that it can be used to assess the quality of the previously discussed algorithms. For details on finite element discretization and solution strategies in FEniCS, we refer the interested reader to the FEniCS tutorials book [42].

Scenario Setup
The right-hand side f as well as the boundary conditions in Eq. (6) are manufactured such that we get the analytical solution u exact (x, y, t) = 1 + g(t)x 2 + 3y 2 + 1.2t.
Using P 2 finite elements and a sufficient approximation order in time allows us to recover u exact and to compute the exact heat flux q exact = ∇u exact via finite element projection. and Ω N . We extract the temperature u or the normal component of the heat flux q ⊥ on Γ from one participant to be provided for the other as Dirichlet boundary condition u = c D or Neumann boundary condition n · ∇u = q ⊥ = c N , respectively. For time stepping, we use implicit Euler (IE), the trapezoidal rule (TR) or a fourth order spectral deferred correction (SDC) scheme [43], depending on the choice of α and the goal in terms of approximation order in time. We construct the SDC scheme from an implicit Euler time integration scheme with three Gauss-Lobatto collocation nodes at t ini + iδt, t ini + (i + 0.5)δt and t ini + (i + 1)δt, i = 1 . . . n and K correction sweeps per time step of size δt.
Quasi-Newton Convergence In Section 2.4, we introduced two variants combining waveform iteration coupling with quasi-Newton acceleration, QN-WI and rQN-WI. In addition, we consider classical underrelaxation for waveform iteration rel-WI and the plain fixed-point iteration full-WI. For rel-WI, we apply an underrelaxation of 0.5, which is optimal for the semidiscrete case [7]. We compare these methods concerning the speed of convergence. As a reference, we compare the convergence of all WI coupling acceleration methods to the convergence of QN-SC for SC(n D ,n N ). The latter quantifies the price we pay for higher order in time in terms of iteration counts per time step.
As we are not considering the approximation order of our time stepping scheme in this paragraph, we use implicit Euler time integration combined with piecewise linear interpolation for WI in both domains (WI(n D ,n N ; 1)). Note that for the setups in which waveform iteration is used (QN-WI, rQN-WI, rel-WI, full-WI), the termination criterion considers interface values at every time step. For the basic variant with plain single value coupling (QN-SC), only the values at the end of the time window are considered. We use a moderate relative coupling tolerance of 10 −5 to reach reasonable runtimes. For all QN acceleration schemes, we use a QR2 filter [20] for the QR decomposition with limit = 10 −3 and a residual-sum weighting of the individual sub vectors [26].
As test case, we use g tri (t), since for g pol (t), the g(t)x 2 term quickly dominates the equation leading to a much simpler coupled problem. See Figure 4 for the iteration per time step for QN-WI for different choices of g(t). Only the periodic g tri (t) shows a rather constant behavior of the iterations per time step.
We measure the average number of iterations required to terminate in every window until T = 10 and present the respective results in Tab. 1. We evaluate the performance for different window sizes ∆t = 0.1, 0.2, 0.5, 1.0, 2.0, 5.0 and different multi-rate settings with n D , n N ∈ {1, 3, 5}. Our study showed that full-WI, i.e., waveform iteration without acceleration, does not converge within 100 iterations per time step. Thus, we present only iteration counts for the other four variants. The most important findings are summarized in the following: 1. For rel-WI, the number of coupling iterations is almost independent of the multi-rate setup. All quasi-Newton variants need two to three times fewer iterations than rel-WI. 2. Overall, QN-WI is more robust, while rQN-WI is more efficient for special cases. In general, rQN-WI seems to have difficulties for setups with many substeps on both sides, while QN-WI can better cope with these setups. rQN-WI and QN-WI perform equally well for WI(n D , 1; 1) as both methods are equivalent for these cases. timestep number of iterations g pol (t) = (t + 1) 1 g pol (t) = (t + 1) 2 g pol (t) = (t + 1) 3 g tri (t) = sin(t) 3. For all variants, the number of iterations slowly decreases with decreasing window size ∆t. This effect is observable, but not very significant. Thus, there is potential to make coupling more asynchronous by applying waveform iteration on larger windows. 4. In comparison to existing literature [26], the number of iterations needed in our experiments is relatively high.
We explain the high iteration count with the fact that our algorithms currently do not reuse data of previous time windows. Reusing iterations is a state-of-the-art approach for reducing the number of iterations needed, which remains as future work. 5. QN-SC performs well and does not suffer from multi-rate, even though the time steps (δt) are ignored by the acceleration scheme. Note, that QN-SC uses an easier convergence criterion than the other two quasi-Newton variants, since only the last sample is considered at the end of the window and recall that this method does not allow to achieve more than first order accuracy in time.
From these findings, we conclude that the convergence speed of the quasi-Newton based variants QN-SC, QN-WI and rQN-SC is relatively similar. The underrelaxation based variant rel-WI performs significantly worse. In the remainder of this paper, we will therefore focus on the quasi-Newton based variants.  Table 1: Heat-heat coupling: average number of coupling iterations per time step for various coupling schemes and multi-rate setups SC(n D , n N ) and WI(n D , n N ; 1), respectively, and different window sizes ∆t. As time integration method, implicit Euler is used for both solvers in all examples. The highest and the lowest number per column are always set in bold face.
Approximation Order in Time Above, we have examined the convergence speed of different iterative coupling algorithms and concluded that only quasi-Newton accelerated algorithms are able to reach a competitive number of iter-ations. In the following, we concentrate on the accuracy of the quasi-Newton based multi-rate setups WI(n D , n N ; p) and SC(n D , n N ).
In a first step, we use g pol (t) and study for which setups and p we can recover the exact solution. As stated in Sect. 2.2, a time stepping scheme of at least order p and and waveform interpolation of at least degree p are required to achieve this. We consider the numerical solution u of the partitioned heat equation to be equal to the exact solution u exact , if the L 2 (Ω) error is smaller than 10 −12 in every time step. To avoid errors being introduced by the iterative coupling procedure, we use a very strict relative coupling tolerance of 10 −12 . As explained above, our discretization uses piecewise quadratic finite elements for the computation of u to be able to represent u exactly in space.
Using QN-SC for SC(n D , n N ), we observed that only for the trivial non-multi-rate cases with n D = n N = 1 using IE and α = 1 or TR and α = 2, where no interpolation is necessary, the exact solution can be recovered. The only multi-rate settings, where the exact solution could be obtained are the edge cases SC(1, n N ), n N ∈ {1, 2, 3, 5} with α = 1 and IE. Here, the starting Dirichlet solver only requiring data at the end of the time window computes the correct flux for the end of the Neumann window, which is the only place where we measure the error. We expect the single time steps on Neumann side to not give the exact solution. For other multi-rate setups (such as SC(5, 2)), we could not obtain the exact solution, which was to be expected.
In a second step, we determine the convergence order of time stepping for the setup using g tri . For the convergence study, we determine the L 2 (Ω) error of the numerical solution u with respect to the exact solution u exact . We consider the error at final simulation time T = 1 and decrease the window size ∆t. We use a moderate coupling tolerance of 10 −5 . Figure 5 shows a convergence study for WI(5, 3; p) and SC (5,3). Note, that we intentionally pick a setup where interpolated values are required by both participants of the coupled simulation at every time step. If WI(5, 3; p) is used for coupling in combination with a time stepping method of order ≥ p, we expect convergence order p as detailed in Sect. 2.2 . We observe this for IE and TR, combined with linear and quadratic interpolation, respectively. For the fourth-order SDC and cubic interpolation, we observe a convergence order of approximately 3.5.
Additionally, we observe that the accuracy of SDC is bounded by the coupling tolerance for small ∆t and the error does not scale with the time step anymore. SC(5, 3), on the other hand, gives slow to no convergence and significantly higher errors for IE and TR time integration methods.
To investigate the effect of multi-rate schemes being used, we perform another convergence study, where TR with WI(2, 2; p), WI(5, 2; p), WI(2, 5; p), and WI(5, 5; p) is used (see Figure 6). The degree of the interpolating polynomials is irrelevant for cases, where no interpolation is needed, i.e., WI(2, 2; p) and WI(5, 5; p) reach second order regardless of the interpolation scheme being used. Note that this effect is only expected to occur for time stepping schemes such as TR or IE, where no additional function evaluations in time are required. For SDC, we did not observe this effect. For WI(5, 2; p) and WI(2, 5; p), piecewise quadratic interpolation (p = 2) allows us to reach second order of the time stepping scheme, as expected. Here, the benefit of multi-rate is clearly visible: WI(2, 5; 2) shows a significantly lower error than WI(5, 2; 2) and almost the same as WI(5, 5; 2), which means that higher resolution in time for N is more beneficial than for D. Presumably due to the term x 2 sin(t) in u exact , which is significantly larger in Ω N .
If only piecewise linear interpolation(p = 1) is applied, second order can be reached for certain cases (e.g. WI(2, 5; 1)). This might be due to the smaller time steps on the Neumann side, from which the Dirichlet solver reads the interpolated data. In other words, it can be assumed that what we observe is the convergence of the dominating spatial error of the Dirichlet solver (using a 2.5 times larger time step). WI(5, 2; 1) shows the expected deterioration of the order below second order. We observe that, even though reaching second order convergence, the accuracy of WI(2, 5; 1) is significantly lower than for the equivalent case with higher-order interpolation WI(2, 5; 1).  To summarize, we conclude that the converged solutions of QN-WI for WI(n D , n N ; p) fulfil the expectations in terms of accuracy. SC(n D , n N ), on the other hand, yields only first order accuracy. We observe that interpolation plays an important role for waveform iteration. If we want to avoid order degradation, we have to use an interpolation with a sufficiently high polynomial degree. The required degree depends on the order of the time integration method, usually we require the polynomial degree p to be at least equal to the order of the time stepping scheme. However, we notice that, for exceptional cases, p lower than the order of the time stepping scheme can be sufficient.

Fluid-Structure Interaction
We now study whether we can generalize our observations from the relatively simple heat transfer problem to a more involved fluid-structure interaction problem. We show that QN-WI coupling allows us to couple two different secondorder time integration methods, a trapezoidal rule used in the fluid solver and a Newmark-β method used in the solid solver to an overall second-order coupled simulation even when combining different time step sizes in a multi-rate setup. For the solid domain, we use the linear elasticity solver developed in FEniCS and validated against CalculiX as part of the Bachelor thesis of Richard Hertrich [44], to which we also refer for details. We use a structured mesh with 5 × 50 cells, where each cell is cut into two triangles and P 2 finite elements for discretization. For time-integration, as mentioned previously, we use a second-order Newmark-β method. The resulting linear system is solved by a direct solver.
For the fluid domain, we implemented a new incompressible Navier-Stokes solver in Nutils specifically for this testcase. We use a structured adaptive mesh as depicted in Figure 7 left. We formulate the fluid equations fully-coupled and fully non-linearly in an arbitrary-Lagrangian-Eulerian framework and discretize with Q 2 − Q 1 Taylor-Hood elements. The non-linearity is resolved by Newton's method with line-search up to an absolute residual of 10 −9 . A direct solver is used for the resulting linear systems. The mesh displacement is modeled by a simple Laplace equation, discretized by Q 1 elements and again solved with a direct solver. For time integration, we make use of the second-order trapezoidal rule, meaning we replace the time derivative of the velocity by the finite difference (u i+1 − u i )/δt and use a linear average between the variables of the current and the previous time step for the diffusive and the convective term. To avoid pressure oscillations, however, we take the pressure gradient and the continuity term only at the new time level.
To get to second order in an FSI setup, a few enhancements of the fluid time integration are necessary. Firstly, a second-order backward difference scheme is needed to compute the mesh velocity from the mesh displacement, where u mesh is the mesh velocity, d the mesh displacement, and the index i the time step. Secondly, the computation of forces acting on the solid needs to be constructed carefully. We compute fluxes over the fluid boundaries by evaluating the residual of the weak formulation at the current solution and extracting the coupling boundary [45]. Afterwards, we compute forces from the fluxes. To get second-order-accurate forces, we need to adapt the temporal discretization of two terms of the weak form compared to the trapezoidal form explained above. As time derivative of the velocity, we here use a second-order stencil, where u is the velocity. Furthermore, the trapezoidal rule with a fully-implicit pressure gradient as explained above only results in second-order pressure values at the midpoint p i+0.5 . Using these values at the time level i + 1, as often done, is, however, only first-order accurate. To get second-order-accurate values at this time level, we extrapolate in the pressure gradient term.
As the fluid solver and the solid solver use non-matching meshes at the coupling boundary, we require data mapping methods between both meshes. We, therefore, use radial-basis function interpolation with thin-plates splines as provided by preCICE [46]. In the fluid solver (Nutils), we directly read the displacement values at the Gauss points. FEniCS, however, requires a functional description of boundary conditions. We, therefore, read force values at the mesh vertices and construct the boundary function again by a radial-basis function interpolation; this time in the FEniCS adapter [44].
To validate the new fluid solver, its mesh movement, and its force computation, we compare the Nutils-FEniCS results against OpenFOAM-FEniCS results from the preCICE tutorials. For both simulations, we use ∆t = 0.01 s and QN-SC(1,1) as coupling algorithm. Figure 7 shows a close agreement for the x-displacement of the tip until T = 5.0 s for an initial fluid at rest.

Convergence Study
To study the temporal convergence order of a coupled simulation, we compute the Perpendicular Flap testcase until T = 0.01 s with decreasing window sizes ∆t l = 0.0025 · 2 −l s for time levels l = 0, 1, . . . , 4.
For all levels, we compute QN-WI with WI(1, 1; 1), WI(2, 3; 2), and WI(3, 2; 2). As reference solution for all methods, we use WI(1, 1; 1) with time level l = 6. A crucial ingredient to achieve second order are suitable initial conditions. We start the fluid solver from an initial Stokes solution and then scale the forces on the structure until T /2 with a C 1 -continuous sin 2 (π(t + δt)/T ) ramp. We measure the error in the displacement values at the coupling boundary to the reference solution d ref .
To smooth out variations, we consider the error over the complete time interval  2) appear to be shifted by one time level, which indicates that the solid time step size dominates the overall error. This again shows the great potential of multi-rate time stepping as the fluid problem is the far more expensive one. Thus, using a smaller time step size for the structure solver is almost for free. We compute the coupling iterations up to a relative residual error of 10 −6 and list the average number of iterations in Tab. 2. We can observe a slight increase of required iterations with decreasing window size, contrary to our observations for the heat transfer testcase in the last section. This does not come as a surprise as, for incompressible FSI and vanishing structural mass, a similar behavior is reported in literature [37]. It opens, however, the possibility for optimization as increasing the window size while keeping the time step size constant might lead to a more efficient numerical method, besides also having all the computing advantages of increased asynchronicity (such as reduced communication and better balancing of varying computational load). A detailed study is, however, beyond the scope of this paper. Lastly, comparing the required iterations of WI(2, 3; 2) with WI(3, 2; 2) could indicate that the structural time step size is the critical part.

Conclusions and Outlook
This contribution shows that using waveform iteration in a black-box approach for partitioned multi-physics simulations is possible. Black-box means here in particular that the required interpolation in time can be computed without using discretization details of the coupled codes. An elegant prototype software design using a middle-layer called waveform bindings allows us to test arbitrary mutli-rate settings with the coupling library preCICE without any alterations to the latter. Arbitrary multi-rate settings means that we allow any (and different) numbers of time steps in the involved solvers within a so-called coupling window. Our results show that (i) the implementation actually achieves the predicted order of consistency in time and that (ii) we can generalize interface quasi-Newton methods to the space-time domain. For more complex scenarios such as fluid-structure interaction, we observed that actually achieving higher order requires careful consideration of all solver components, of mesh moving, and of evaluations of coupling variables.
Our current implementation has a couple of limitations, which we intend to tackle in future work: (i) We have to move the functionality of the waveform bindings to preCICE and decide how to do this while maintaining backwards compatibility of the application user interface as far as possible. This also removes the technical restriction to Python.
(ii) Features that optimize the performance of quasi-Newton schemes need to be provided also for waveform iterations. This holds in particular for re-use of previous time step data and improved initial guesses for the initial approximation (currently constant extrapolation) of interface data before the first iteration in a new time window. (iii) Our methods should be generalized to adaptive time stepping, which is less of a mathematically unsolved problem, but a software issue as it might require varying numbers of time steps and, thus, varying numbers of coupling data structures in preCICE from one time window to the next. In addition to these enhancements, further studies for arbitrary high order (as in [47]) will be necessary.