Convergence analysis for parallel-in-time solution of hyperbolic systems

Parallel-in-time algorithms have been successfully employed for reducing time-to-solution of a variety of partial differential equations, especially for diffusive (parabolic-type) equations. A major failing of parallel-in-time approaches to date, however, is that most methods show instabilities or poor convergence for hyperbolic problems. This paper focuses on the analysis of the convergence behavior of multigrid methods for the parallel-in-time solution of hyperbolic problems. Three analysis tools are considered that differ, in particular, in the treatment of the time dimension: (1) space-time local Fourier analysis, using a Fourier ansatz in space and time, (2) semi-algebraic mode analysis, coupling standard local Fourier analysis approaches in space with algebraic computation in time, and (3) a two-level reduction analysis, considering error propagation only on the coarse time grid. In this paper, we show how insights from reduction analysis can be used to improve feasibility of the semi-algebraic mode analysis, resulting in a tool that offers the best features of both analysis techniques. Following validating numerical results, we investigate what insights the combined analysis framework can offer for two model hyperbolic problems, the linear advection equation in one space dimension and linear elasticity in two space dimensions.


INTRODUCTION
While spatial parallelism is a well-known tool in scientific computing, hardware trends and scaling limits have served to renew interest in algorithms that also allow "space-time" parallelism. These techniques consider the solution of time-dependent systems of partial differential equations (PDEs), and aim to compute the solution in an "all-at-once" manner, breaking the sequential nature of traditional time-stepping approaches. While this is not a new idea 1, 2 , recent years have seen significant effort devoted to space-time and time-parallel approaches for the solution of time-dependent PDEs [3][4][5][6][7][8][9][10] . With such intense interest in the development of new schemes, there is a pressing need for complementary analysis tools, to provide understanding of the relative performance of related schemes and to inform the optimization of algorithmic parameters as schemes are adapted to new problems. The central aim of this paper is to compare and contrast three such analysis schemes for parallel-in-time algorithms of the Parareal 11,12 and multigrid-reduction-in-time 3 (MGRIT) methodologies.
From the multigrid perspective, mode analysis is an attractive tool for analyzing convergence of these methods, using either eigenvectors or Fourier modes (and related techniques) to predict convergence rates. In the context of space-time discretizations, three approaches to mode analysis have been discussed in the recent literature: space-time local Fourier analysis 4,13,14 (LFA), semi-algebraic mode analysis 9,15,16 (SAMA), and reduction-based theory 8,[17][18][19] . A fourth approach, Fourier-Laplace mode analysis [20][21][22] , can be seen as an intermediate between space-time LFA and the other two approaches, and will be discussed only in that context. While not considered here, the recent analysis of Gander et al. 23 extends the earlier result of Gander and Hairer 24 from the Parareal context to MGRIT. These approaches bound the convergence of the algorithm based on Lipschitz continuity and other properties of the time propagators at the coarse and fine levels; while this offers insight into the general convergence properties of the algorithms, it is difficult to compare fairly to the mode analysis tools that are the focus here.
In broad terms, the advantages and disadvantages of the three mode analysis tools are as follows. LFA is a well-known and widely used tool for analysis of spatial multigrid methods 25,26 , for which it allows quantitative predictions of two-and multilevel convergence. However, numerical experience 13,14 shows that it is not predictive for space-time methods in many situations, particularly for meshes with few points in time or discretizing "short" temporal domains. The extension of LFA to semi-algebraic mode analysis 15 overcomes this limitation on space-time LFA; however, as implemented by Friedhoff and MacLachlan, the quantitative estimates of SAMA required the computation of Euclidean operator norms of matrices with size equal to the number of time steps multiplied by the dimension of the spatial LFA symbol. When done exactly, this can be prohibitively expensive for complicated problems over long time intervals. Two-level reduction analysis 8,17 overcomes this computational expense, under certain conditions, but is based on solving eigenvalue problems of size equal to the number of degrees of freedom in the spatial discretization. In the results below, we explore each of these methods for two specific hyperbolic model problems, linear advection in one dimension and incompressible linear elasticity in two dimensions, and compare and contrast their predictions in these settings.
Several important observations come from this comparison. First, we see that the bounds used to make the reduction analysis computationally tractable can be directly applied to SAMA. This results in a tool that combines the best of both approaches, using the flexibility of spatial LFA in settings where the spatial eigenproblem is intractable, but the ease of computing a bound for large numbers of time steps that comes from reduction analysis. Secondly, we expose the complications of assuming unitary diagonalizability, as is done in the reduction analysis, and detail how this can be avoided in the SAMA bound. Overall, this allows us to get more insight into MGRIT convergence, particularly as we vary both spatial and temporal problem sizes. A final benefit is the ability to critically compare space-time LFA with the improved SAMA analysis, and gain better insight into problems and parameter regimes for which space-time LFA may be a feasible and sufficiently accurate option.
The remainder of this paper is organized as follows. Details of the model problems, linear advection in one spatial dimension and incompressible linear elasticity in two spatial dimensions, are given in Section 2. The Parareal and MGRIT algorithms are reviewed in Section 3. Section 4 describes the three mode analysis tools in detail, with discussion on how to combine aspects of SAMA and reduction analysis in Section 4.4. Numerical results comparing the analysis methods as they exist in the literature are given in Section 5.1. Sections 5.2 and 5.3 present numerical results giving new insight into convergence of the Parareal and MGRIT algorithms for linear advection and elasticity, respectively, based on SAMA improved by insights from reduction analysis. Concluding remarks are given in Section 6.

Linear advection
We consider the advection of a scalar quantity, ( , ), subject to a known non-zero constant flow speed, , in the domain Ω × [0, ], where Ω is a finite interval, [ , ], and denotes the final time. For example, this models advection of nonreactive particles by an incompressible fluid, where particle density, ( , ), depends only on advection of particles by the fluid. The governing equation is given by where we assume > 0 for the subsequent discussion. We prescribe the initial condition ( , 0) = 0 ( ) and impose the periodic boundary condition ( , ) = ( , ) in all that follows. We discretize (1) on a uniform space-time grid, with spatial intervals of width Δ = ( − )∕ and temporal intervals of time step Δ = ∕ , using a first-order implicit upwind scheme:

Linear elasticity
We consider the dynamic and linear elastic response of an incompressible solid structure in the domain Ω × [0, ], where Ω is an open domain in ℝ 2 , and denotes the final time. Denoting the current and reference position of a material point by and , respectively, and the respective Eulerian and Lagrangian gradient operators by ∇ and ∇ , we define the deformation gradient, , by = ∇ = + ∇ , where ( , ) = + ( , ) defines the displacement, , of the material point, , in the current configuration at time with respect to its position in the reference configuration, , and where denotes the identity matrix of corresponding size. Then, the governing equations are given by where denotes material density, and where = ( , ) = ( − ) − is the Cauchy stress tensor for an incompressible linear elastic material with stiffness parameter, , and hydrostatic pressure, . We prescribe and at = 0, ( , 0) = 0 and ( , 0) =̂ 0 , and impose homogeneous Dirichlet boundary conditions, ( , ) = 0 for ∈ Γ and ∈ [0, ], where Γ denotes the Dirichlet boundary of the domain Ω. Using = ∇ − , neglecting higher-order effects of the deforming domain, and transforming Equation (3) to a system of first-order equations, we obtain for displacement, , velocity, , and hydrostatic pressure, .
We discretize Equations (6)-(7) on an equidistant time grid consisting of ∈ ℕ time intervals using a time-step size Δ = ∕ . Motivated by existing results 8 , we use implicit Euler for the time discretization. Denoting displacement, velocity, and pressure at time = Δ , = 0, … , , by , , and , respectively, for = 1, … , , Equations (5) and (6) are discretized as and The weak form of Equations (8), (9), and (7) is found by multiplying by test functions, ∈ ( 1 (Ω)) 2 , ∈ ( 2 (Ω)) 2 , and ∈ 2 (Ω), respectively, giving We discretize the spatial domain, Ω, on a uniform quadrilateral grid with mesh size Δ using Taylor-Hood ( 2 − 1) elements 27, 28 for velocity, , and pressure, , and 2 elements for displacement, , ensuring uniqueness of the solution. Denoting the mass and stiffness matrices of the discretized vector Laplacian by and , respectively, and the (negative) discrete divergence operator by , Equations (10)-(12) are discretized to which are equivalent to the following linear system of equations It is tempting to simply remove the rows and columns corresponding to the pressure variable from this system, since −1 does not appear in the equations at time step . Indeed, this approach was taken in the analysis by Hessenthaler et al. 8 . However, to do so ignores the important role that plays as a Lagrange multiplier, enforcing the incompressiblity constraint for the solution at time step . Instead, we eliminate the contribution from this block by considering the block factorization of the matrix in (16) and the resulting Schur complement. Factoring this matrix gives , and can be directly eliminated from the equation for by subtracting Δ from the right-hand side of (13). Noting that both velocity and displacement are two-dimensional vector fields of 2 degrees of freedom, this yields the reduced system with four scalar functions, It is this form of the propagator that we analyse below. We note that although the only nonzero operator acting on the pressure acts on , its effect is to change the dependence of and on −1 and −1 . This can easily be seen to take the form of an orthogonal projection operator acting on the data from the previous time-step; it is also easy to check that this formulation guarantees that = 0, as required by the incompressibility constraint. As is common in MGRIT, we primarily analyse this propagator in "Φ-form", i. e., in the form [ ] = Φ[ −1 −1 ] , writing the projection operator, = − (

Parareal
The Parareal algorithm 12 is a parallel method for solving systems of ordinary differential equations of the form arising, for example, when solving a system of PDEs using a method-of-lines approximation to discretize the spatial domain. Parareal can be interpreted in a variety of ways, including as multiple shooting, domain decomposition, and multigrid methods 3,11 . Here, we describe Parareal as a two-level time-multigrid scheme. For ease of presentation, we only describe the method in a linear setting, i. e., in the case that is a linear function of ( ); the full approximation storage (FAS) approach 29 can be applied in the same manner to accomodate nonlinear problems. Parareal combines time stepping on the discretized temporal domain, the fine grid, with time stepping on a coarser temporal grid that uses a larger time step. More precisely, consider a fine temporal grid with points = Δ , = 0, 1, … , , with constant time step Δ = ∕ > 0, and let be an approximation to ( ) for = 1, … , , with 0 = (0). Further, consider a one-step time integration method, with time-stepping operator, Φ, that takes a solution at time −1 to that at time , along with a time-dependent forcing term, . (Note that both the assumption of constant time step and of a time-independent single-step time-stepping operator are for notational convenience only, and can easily be relaxed.) The discrete approximation to the solution of (18) can be represented as a forward solve of the block-structured linear system Note that, in the time dimension, this forward solve is completely sequential. Parareal enables parallelism in the solution process by introducing a coarse temporal grid, or (using multigrid terminology) a set of C-points, derived from the original (fine) temporal grid by considering only every -th temporal point, where > 1 is an integer called the coarsening factor. Thus, the coarse temporal grid consists of = ∕ intervals, with points = Δ , = 0, 1, … , , with spacing Δ = Δ ; the remaining temporal points define the set of F-points, as visualized in Figure 1. The coarse-grid problem, Δ = Δ , is defined by considering a one-step method with time-stepping operator, Φ , using time step Δ , given by where Δ, denotes an approximation to ( ) for = 1, … , , and Δ,0 = (0). Rather than sequentially solving for each , Parareal alternates between a "relaxation scheme" on the F-points and a sequential solve on the C-points. The first of these two processes, known as F-relaxation, updates the unknowns at F-points by propagating the values from each C-point, , across a coarse-scale time interval, ( , +1 ), = 0, 1, … , − 1 using the fine-grid timestepper, Φ. Note that within each coarse-scale time interval, these updates are sequential, but that there is no dependency across coarse time intervals, enabling parallelism in the relaxation process. After F-relaxation, the residual is evaluated only at the Cpoints and "injected" to the coarse temporal grid. After a sequential solve of the coarse-grid equations (for which typical choices in Parareal use high-order integration schemes over longer time intervals), a correction is interpolated to the fine grid, using "ideal interpolation" (which is "ideal" as it corresponds to using the Schur complement on the coarse grid). This interpolation operator is defined by taking the corrected approximate solution at at each C-point, , and time-stepping across the coarse-scale time interval, ( , +1 ), = 0, 1, … , − 1, again using the fine-grid time-stepper, Φ. The injection and ideal interpolation operators are block operators of size ( + 1) × ( + 1) or ( + 1) × ( + 1), respectively, given by As the error propagator of F-relaxation can be written as Φ , the Parareal algorithm may be represented by the two-level iteration matrix, where equivalence holds since Φ defines the Schur complement coarse-grid operator,

Multigrid-reduction-in-time
The major sequential bottleneck for Parareal is the sequential solve of the coarse-grid system, Δ = Δ . Directly applying multigrid principles and using Parareal recursively to solve this system is observed to often have significantly degraded convergence properties. This motivates the multigrid-reduction-in-time (MGRIT) algorithm 3 , which is based on applying multigrid reduction techniques 30,31 to the time dimension. Like Parareal, MGRIT uses injection and ideal interpolation for intergrid transfer operations, and it uses the same coarse-grid operator, , as can be used in the two-level Parareal method. However, in order to obtain a scalable multilevel algorithm, MGRIT augments F-relaxation, using relaxation also at the C-points. This Crelaxation is defined analogously to F-relaxation, by leaving the values of F-points unchanged, and updating the unknowns at C-points using the time-stepper, Φ, and values at neighboring F-points. Relaxation in MGRIT consisting of combined sweeps of F-and C-relaxation; typically, FCF-relaxation, F-relaxation, followed by C-relaxation, and again F-relaxation, is employed. This scheme can be shown to be equivalent to Richardson relaxation on the coarse time grid with the Schur complement coarse-grid operator, , followed by ideal interpolation 3 . Writing the error propagator of FCF-relaxation as Φ ( − ) , the two-level MGRIT algorithm may be represented by the two-level iteration matrix, Instead of inverting directly as in the two-level MGRIT algorithm (21), in the three-level method, the system on the first coarse grid is approximated by a two-grid cycle with zero initial guess, that is, the term −1 in the two-level iteration matrix (21) is replaced by ( − (2,3)− ) −1 (where (2,3)− is defined analogously to (21)) to obtain the three-level V-cycle iteration matrix, or, it is replaced by ( − (2,3)− (2,3)− ) −1 to obtain the three-level F-cycle (or, equivalently, W-cycle) iteration matrix respectively. Iteration matrices of three-level V-and F-cycle Parareal methods can be derived analogously. While there are clear and important differences, both historical and in practice, between Parareal and MGRIT, in this paper we view them as being two methods within the same broader family. As such, we will consider in detail only the case where the coarse-grid operator, , is given by rediscretization of the fine-grid operator with the coarse time-step. Since Parareal methods often make use of different coarse-and fine-propagators, we will instead refer to the methods where this condition is imposed as MGRIT with F-and FCF-relaxation, instead of as Parareal and MGRIT, respectively, but use the more distinctive terminology for statements that are true regardless of the choice of coarse propagator. An interesting property of Parareal and MGRIT is the following exactness property: Assuming exact arithmetic, one iteration of F-relaxation computes the exact solution at the first − 1 time steps, i. e., at all F-points in the first coarse-scale time interval. Furthermore, one iteration of FCF-relaxation computes the exact solution at the first 2 − 1 time steps. Therefore, MGRIT with F-relaxation solves for the exact solution in ∕ iterations, while MGRIT with FCF-relaxation solves for the exact solution in ∕(2 ) iterations, corresponding to half the number of points on the coarse grid.

MODE ANALYSIS TOOLS FOR PARALLEL-IN-TIME METHODS
While both Parareal and MGRIT can be analysed based on abstracted properties of the coarse and fine time integrators 23,24 , these tools do not provide deep insight into the success or failure of algorithmic choices within the methodology. Instead, here we present three tools based on mode analysis principles applied to the iteration matrices, and , given in Equations (20) and (21), respectively, and to the iteration matrices of the respective three-level methods.

Space-time local Fourier analysis
Local Fourier analysis is a classical tool for analysing convergence of multigrid methods 26,29 , that has been applied to spacetime problems with mixed results in terms of accuracy in predictions 4,13 . Here we present a brief review of the calculation of spatial Fourier symbols (as will be used in all of the mode analysis tools considered in this section), along with their coupling with local Fourier analysis in the time direction.

Spatial Fourier symbols
The local Fourier analysis of the time-stepping operator, Φ, makes use of its Fourier representation, also called the Fourier symbol. For the scalar advection equation, this follows from standard analysis; however, LFA for systems of PDEs requires more advanced tools 32,33 . Note that we do not consider any coarsening of the spatial domain here and, thus, we only need the spatial Fourier symbol of the time-stepping operator.
Standard LFA for a problem in one spatial dimension considers a scalar discrete problem posed on an infinite uniform grid with mesh size Δ > 0, Extending a constant-coefficient discrete operator from a finite spatial domain, such as that on the left of Equation (2), to the infinite grid Δ leads to a Toeplitz operator acting on the space of square-summable sequences ( 2 ) that is formally diagonalizable by standard Fourier modes, Noting that the operator Φ is the inverse of the lower bidiagonal matrix on the left of (2), we find that its Fourier symbol, denoted Φ , is the scalarΦ Furthermore, the Fourier symbols for the time integrators Φ and Φ on the first and second coarse grids can be obtained by replacing Δ by Δ in (24) to account for factor-temporal coarsening. We note that for a spatial problem with periodic boundary conditions, the eigenvalues of Φ are given precisely by samplingΦ at evenly spaced points in (− , ]; for other boundary conditions, a good estimate of the discrete eigenvalues of Φ is given by the range of its Fourier symbol 34 . For the elasticity operator in (17), we must generalize this analysis to consider a two-dimensional infinite uniform grid, along with adaptations to handle the block structure imposed by the mixed finite-element discretization. Accordingly, we consider the two-dimensional infinite uniform grid, decomposed as with Δ , This decomposition arises by considering the variation in basis functions for the standard 2 approximation space, with nodal basis functions defined for mesh points at corners of Δ × Δ quadrilateral elements, ∈ Δ , , mesh points located on the -and -edges of elements, ∈ Δ , or ∈ Δ , , and cell-centered mesh points, ∈ Δ , . While the finite-element discretization of a scalar PDE on a structured two-dimensional infinite uniform grid using nodal ( 1) finite elements leads to an operator that is block Toeplitz with Toeplitz blocks (BTTB) that has a scalar symbol, the same is not true for higher-order elements. However, by using the same basis on each element and permuting the operator into block form ordered by "type" of basis function, we readily obtain a block operator with blocks that are themselves BTTB operators and can be used to generate a matrix-based Fourier symbol 33,35 . On Δ , this arises by considering a discrete operator, Δ , defined in terms of its stencil for each "type" of degree of freedom. For example, for the nodal degrees of freedom, we have with constant coefficients, ∈ ℂ, and = ( 1 , 2 ) taken from a finite index set, Similar definitions are used for the discrete operator acting on degrees of freedom corresponding to mesh edges and cell centers. Note that the decomposition of the index set, , naturally defines Δ as a block operator, with each block corresponding to one type of mesh point. For the elasticity operator in (17), we obtain a block operator of size 16 × 16, since there are four scalar functions in the system, each discretized using Q2 elements with four types of degrees of freedom.
With this, the Fourier representation (symbol), of an operator, Δ , denoted bỹ Δ ( ), with ∈ (− , ] 2 , is a block matrix that can be computed using a Fourier basis that accounts for the different types of mesh points of Δ . This Fourier basis is given by for ∈ (− , ] 2 , with Considering the discretization of a system of PDEs with types of degrees of freedom (corresponding to both different functions in the PDE system and different basis functions used in higher-order discretizations of a single function, e. g., = 16 for the elasticity system) on a quadrilateral grid, the discretized operator can be written as a block operator, noting that , Δ is a map from a function associated with degree of freedom to that associated with degree of freedom , which may be defined on different types of meshpoints. Formally, we think of Δ ( ) as describing the action of  Δ on the -dimensional space given by linear combinations of the Fourier modes associated with each type of degree of freedom in the system. Written as a × matrix, Δ ( ) maps the coefficients of a vector-valued function, ⃗ Δ , in this basis to the coefficients of the function described by  Δ ⃗ Δ .
For the linear elasticity problem, the time-stepping operator, Φ, is defined in Equation (17), Noting that both the velocity and displacement degrees of freedom are two-dimensional vector fields of 2 degrees of freedom, we have four scalar functions in the system, whose basis is naturally partitioned into four types, with nodal, edge-based, and cell-centered degrees of freedom. Thus, the Fourier symbol,Φ, of Φ is a block matrix of size 16 × 16, computed from the Fourier symbols of its component parts, The Fourier symbols of , , and are derived from standard calculations 36,37 , and given for reference in Appendix A.

LFA in time
We consider the infinite uniform fine and coarse temporal grids, where the grid is derived from by multiplying the mesh size, Δ , by a positive coarsening factor, . The fundamental quantities in LFA are the Fourier modes, given by the grid-functions, with frequency, , taken to vary continuously in the interval − , 2 − , although any interval of length 2 could be used instead. Fourier modes on the coarse grid are defined analogously on the interval (− , ] as explained below. Considering a coarsening factor of , a constant-stencil restriction operator maps fine-grid functions, the Fourier harmonics, into one coarsegrid function. More precisely, these functions consist of the mode associated with some base index (0) ∈ − , and those associated with the frequencies ( ) The Fourier harmonics associated with base frequency, (0) , define subspaces, that are left invariant by the iteration operator. As a consequence, using the matrix of Fourier modes, ordered by Fourier harmonics, we can block-diagonalize the infinite grid operators with blocks corresponding to the spaces of harmonic modes. Each block describes the action on the coefficients { } =0,…, −1 of the expansion, Instead of analyzing the error propagation on this basis for the space of harmonics,  (0) , we consider the transformed basiŝ wherê denotes the vector with entries equal to one at time points + , for ∈ ℕ, and zero otherwise. This transformed basis is motivated by the following: Consider a function, ∈  (0) , with = + for ∈ ℕ and ∈ {0, 1, … , − 1}. Then, Thus, any function in  (0) can be expressed in terms of the basis defining (0) , with coefficients {̂ } =0,…, −1 that depend only on the "offset" of fine-grid point = + from the coarse-grid indices where = 0. Moreover, implicit in this expression is an associated coarse-grid frequency of (0) , so that e (0) ⋅ + ∕Δ = e ( (0) )⋅ + ∕Δ . In many senses, this is a more natural basis for the space of Fourier harmonics, since it directly associates a repeating pattern of coefficients in the basis of (0) with the temporal mesh, as illustrated in Figure 2.̂ 0̂ 1̂ 2 For the three-level analysis, we consider a hierarchy of three time grids, with grid spacings Δ , Δ , and 2 Δ , respectively. Correspondingly, we consider base frequencies, (0) ∈ − 2 , 2 , and time-series coefficients {̂ } =0,…, 2 −1 , {̂ } =0, ,…,( 2 −1) , and̂ 0 , respectively. Figure 3 shows the time-series coefficients on the time grids for the case = 4 and Visualization of the time-series coefficients on the three-level time-grid hierarchy with coarsening factors = 4 and 2 = 2.

Definition 1.
Let ℎ be an infinite-grid (multilevel) Toeplitz operator, and for given be a function with uniquely defined coefficients {̂ } =0,…, −1 . The transformation of the coefficients under the operator ℎ , A complete convergence analysis of the convergence properties of a Parareal or MGRIT algorithm arises from computing the symbols of each individual operator of the corresponding iteration matrix, , to obtain the symbol,̂ ( , (0) ), of the iteration matrix as a whole. The asymptotic convergence behavior can then be predicted by calculating the asymptotic convergence factor Note that, in practice, LFA has to be taken from its infinite-grid setting, and we maximize, instead, over a finite set of frequency tuples, ( , (0) ). Similarly, we introduce the error reduction factor again maximizing over finite sets of values of and (0) to get a computable prediction. Since for both the Parareal and MGRIT approaches, iteration operators have only a single eigenvalue of zero, only the error reduction factor provides insight into the convergence behavior of a Parareal or MGRIT algorithm. Furthermore, due to the non-normality of the iteration operators, it is crucial to not only consider the iteration matrix itself, but also powers of the iteration matrix to examine the short-term convergence behavior. More precisely, for any initial error, where denotes the number of iterations of the multigrid scheme. Thus, we introduce the error reduction factors providing the worst-case error reduction, i. e., an upper bound for the error reduction, in iteration for the time-periodic problem.
In practice, we observe that this also provides upper bounds for the worst-case error reduction in the non-periodic case but this, of course, is no longer a rigorous bound.

Space-time Fourier symbols of MGRIT with F-and FCF-relaxation
LetΦ andΦ , denote the × ( ≥ 1) spatial Fourier symbols of the fine-and coarse-scale time integrators, Φ and Φ , respectively, and let denote an identity matrix of size × . Then, the space-time Fourier symbol of the fine-grid operator, , is given by the × matrix̂ The set of modes ( , (0) ) for (0) ∈ − , is a complete set of space-time Fourier modes on the coarse grid, Ω . Further, any function in  ( , (0) ) is aliased on the coarse grid with the function̂ 0 ( , (0) ) with a fixed coefficient̂ 0 . As a consequence, the space-time Fourier symbol of the coarse-grid operator, , is a block matrix with one block of size × , given bŷ The space-time Fourier symbols of the interpolation and restriction operators, Φ , and , are block matrices with × 1 blocks or 1 × blocks of size × ,̂ Since the F-and FCF-relaxation operators, and , are equal to Φ and Φ ( − ) , respectively, the space-time Fourier symbols of F-and FCF-relaxation are defined bŷ This completes the definition of the space-time Fourier symbols of the operators for F-and FCF-relaxation and, by using the expressions (20) and (21), this defines the space-time Fourier symbols of the two-level methods as a whole. A similar approach extends to the three-grid case, however, the grid hierarchy affects the block size of the space-time Fourier symbols. Considering temporal semicoarsening by factors of and 2 , respectively, to obtain the first and second coarse grids, the space-time Fourier symbol of the fine-grid operator, , is the same as in the two-level case, but with 2 × 2 blocks of size × instead of × blocks of size × . Denoting the spatial Fourier symbols of the time integrators, Φ and Φ on the first and second coarse grid byΦ , andΦ , , respectively, the space-time Fourier symbol of the coarse-grid operator, , on the first coarse grid is a block matrix with 2 × 2 blocks of size × , , and the space-time Fourier symbol of the coarse-grid operator, , on the second coarse grid is a block matrix with one block of size × ,̂ = −Φ , e − 2 (0) .
The space-time Fourier symbols of the Schur complement coarse-grid operators on the first and second coarse grids, and , , are defined as those of the coarse-grid operators, and , withΦ , andΦ , replaced byΦ andΦ 2 . The space-time Fourier symbols of the restriction operators, and , , from the fine grid to the first coarse grid and from the first coarse grid to the second coarse grid are block matrices with 2 × 2 blocks of size × or 1 × 2 blocks of size × , respectively,̂ blocks Similarly, the space-time Fourier symbols of the interpolation operators, Φ and Φ , from the first coarse grid to the fine grid and from the second coarse grid to the first coarse grid, respectively, are block matrices with 2 × 2 blocks of size × or 2 × 1 blocks of size × , respectively,

Semi-algebraic mode analysis
LFA focuses on the local character of the operators defining the multilevel algorithms. This means that the effects of boundary conditions, including the initial condition in the time dimension, are ignored. Unless long time intervals are considered, LFA can fail to produce its usual quality of predictive results for convection-dominated or parabolic problems 13,14,38,39 . Semi-algebraic mode analysis (SAMA) 15 is one applicable approach of mode analysis that enables accurate predictions of the convergence behavior for multigrid and related multilevel methods. The analysis combines spatial LFA with algebraic computations that account for the non-local character of the operators in time. Other applicable approaches, not considered here, include halfspace mode analysis 38, 40-43 that considers convergence on a discrete half-plane instead of the full infinite lattice used in LFA, and Fourier-Laplace analysis 20-22, 44, 45 , based on using discrete Laplace transforms. For simplicity, we describe SAMA for the two-level MGRIT algorithm with FCF-relaxation, represented by the two-level iteration matrix, , defined in Equation (21), the analysis of other variants is done analogously by considering the corresponding iteration matrix. Motivated by the global block-Toeplitz space-time structure of the operators that define the iteration matrix, the idea of SAMA is to use a Fourier ansatz in space to block-diagonalize the spatial blocks of each individual operator. More precisely, we block-diagonalize the blocks of each operator using the matrix, Ψ, of discretized spatial Fourier modes, with Fourier frequency pairs, ∈ (− , ] 2 , sampled on a uniform quadrilateral mesh given by the tensor product of an equally-spaced mesh over the interval of length 2 with itself, assuming degrees of freedom in the spatial discretization. For example, the spatial blocks of a scalar coarse-grid operator, , are block-diagonalized by computing  −1  , where  = +1 ⊗ Ψ. The resulting matrices (on the fine temporal grid) are then reordered from + 1 × + 1 blocks of size × to a block matrix with × blocks of size + 1 × + 1 to obtain a block diagonal structure, with each block corresponding to the evolution of one spatial Fourier mode over time. Applying SAMA to the iteration matrix, , we obtain a block diagonal matrix, with a discrete choice of Fourier frequencies, , and where the diagonal blocks of the Fourier-transformed and permuted operators are denoted by (⋅) marking the respective operator in the superscript and the spatial Fourier frequency pair, , in the subscript. Note that the grid hierarchy and size of the spatial Fourier symbol of the time integrators, Φ and Φ , affect the block size of the transformed operators. Considering temporal semicoarsening by a factor of in the MGRIT approach and assuming that the spatial Fourier symbol of the time-integration operators, Φ and Φ , are of size × , the blocks ( ) and ( ) are of size ( ∕ + 1) × ( ∕ + 1) , and the blocks and ( ) are of size ( + 1) × ( ∕ + 1) and ( ∕ + 1) × ( + 1) , respectively.
The short-term convergence behavior of MGRIT with iteration matrix, , can then be predicted by calculating the error reduction factor (corresponding to worst-case error reduction in iteration ) Similarly to LFA, in practice, we maximize over a finite set of frequencies, .

SAMA block matrices for MGRIT with F-and FCF-relaxation
Consider temporal semicoarsening by factors of and 2 , respectively, to obtain the first and second coarse grids, and letΦ , Φ , , andΦ , again denote the spatial Fourier symbols of the time integrators, Φ, Φ , and Φ on the fine, first, and second coarse grid, respectively; for a two-level hierarchy, consider the first coarse grid only. Then, the diagonal blocks of the Fouriertransformed and permuted operators, , , and are block bidiagonal matrices with ( +1)×( +1), ( ∕ +1)×( ∕ +1), or ( ∕( 2 ) + 1) × ( ∕( 2 ) + 1) blocks, respectively, of size × , given by The diagonal blocks of the Fourier-transformed and permuted Schur complement coarse-grid operators on the first and second coarse grids, and , , are defined analogously, with −Φ and −Φ 2 on the first subdiagonal. The diagonal blocks of the Fourier-transformed and permuted restriction operators, and , , from the fine grid to the first coarse grid and from the first coarse grid to the second coarse grid are block matrices with ( ∕ + 1) × ( + 1) blocks of size × or ( ∕( 2 ) + 1) × ( ∕ + 1) blocks of size × , respectively, Similarly, the diagonal blocks of the Fourier-transformed and permuted interpolation operators, Φ and Φ , from the first coarse grid to the fine grid and from the second coarse grid to the first coarse grid, respectively, are block matrices with ( + 1) × ( ∕ + 1) blocks of size × or ( ∕ + 1) × ( ∕( 2 ) + 1) blocks of size × , respectively,

Two-level reduction analysis
Two-level reduction analysis 17,19 provides convergence bounds for two-level Parareal and MGRIT algorithms applied to linear time-stepping problems with fine-and coarse-scale time-stepping operators that can be diagonalized by the same set of eigenvectors. In particular, the analysis allows predictions of the convergence behavior for time-stepping problems arising from a method-of-lines approximation of a time-dependent PDE when considering a fixed spatial discretization in the time-grid hierarchy. In Section 4.3.1, we review the ideas behind the two-level reduction analysis for time-stepping problems arising from scalar PDEs and discuss how it can be combined with LFA in space. Section 4.3.2 is devoted to extending the analysis to time-stepping problems arising from systems of PDEs using finite-element discretizations for the discretization in space.

Two-level reduction analysis for scalar PDEs
We consider solving a time-stepping problem (19), arising from a scalar PDE, by two-level MGRIT. As above, let Φ and Φ denote the two time-stepping operators on the fine time grid with time intervals, and on the coarse time grid with = ∕ time intervals, respectively. Furthermore, assume that degrees of freedom are used for the discretization in space, so that Φ and Φ are matrices of size × . Motivated by the reduction aspect of MGRIT, in contrast to analyzing full iteration matrices as in the SAMA approach, we consider the iteration matrices only on the coarse grid, where and denote the coarse-grid operator and the Schur complement coarse-grid operator, respectively, introduced in Section 3.1.
Two-level reduction analysis is based on the same analysis techniques used in SAMA, but applied to the coarse-grid iteration matrix, Δ , instead of to the fine-grid iteration matrix, , of the algorithm. Furthermore, instead of using a Fourier ansatz in space, the eigenvectors of the fine-grid time-stepping operator are used to diagonalize the fine-and coarse-scale time integrators (under the assumption that they are simultaneously unitarily diagonalizable) and, thus, the spatial blocks of the coarse-grid iteration matrix, Δ . More precisely, using the unitary transformation, , that diagonalizes the fine-and coarse-scale time integrators, Φ and Φ , the blocks of the coarse-grid iteration matrix, Δ are diagonalized by computing  −1 Δ , where  = +1 ⊗ . Similarly to SAMA, the resulting matrix is then permuted to obtain a block diagonal structure, with each block corresponding to the evolution of one eigenvector over time. Denoting the eigenvectors of the fine-and coarse-scale time integrators, Φ and Φ , by , and the corresponding eigenvalues by and , respectively, for = 1, … , , we obtain Using the standard norm inequality ‖ Δ ‖ 2 ≤ √ ‖ Δ ‖ 1 ‖ Δ ‖ ∞ for an operator Δ , the convergence behavior of a two-level method is then predicted by calculating the bounds, that have since been shown to be accurate to order (1∕ ) 19 . Assuming that | | ≠ 1 for all = 1, … , , we obtain 17 (1 − | |) | | , and, thus, Remark 1. Note that the two-level reduction analysis requires solving an eigenvalue problem of the size of the degrees of freedom of the spatial discretization to compute the eigenvalues of the two time-integrators, which are assumed to be simultaneous unitarily diagonalizable. To avoid the high computational cost of this eigensolve for high spatial resolutions (and complications in the non-normal case), LFA in space can be applied, providing predictions of the spectra of the time-stepping operators. Thus, the eigenvalues and of Φ and Φ can be replaced by the spatial Fourier symbols, Φ and Φ , , respectively, choosing frequency values for .

Remark 2.
Considering powers of the matrices Δ, and Δ, , we can analogously derive bounds of the 2 -norm of powers of the coarse-grid iteration matrices. For ≥ 2, we obtain Δ, Thus, Using the definition of the binomial coefficient, for ≥ 2 we obtain

Systems of PDEs
The assumption that a time-stepping operator can be diagonalized by a unitary transformation does not necessarily hold true for time-stepping operators arising from systems of PDEs, since such time integrators can easily be non-symmetric. However, when considering a two-level time-grid hierarchy and a fixed spatial discretization on both grid levels, the assumption that the fineand coarse-scale time integrators, Φ and Φ , can be simultaneously diagonalized may still hold true for time-stepping problems arising from systems of PDEs. One possible generalization of the two-level reduction analysis to handle this case is to derive bounds for the coarse-grid iteration matrix in a mass matrix-induced norm instead of in the 2 -norm 17,19 . Here, to enable a comparison of the different analysis techniques for the systems case, we derive bounds of the 2 -norm of the coarse-grid iteration matrices.
We consider × block time-stepping operators, Φ and Φ , arising from a semi-discretization in space of a system of scalar PDEs with unknown functions; note that in the case of a PDE system with vector equations, we first have to break down the vectors into their scalar components. Furthermore, let Δ be the coarse-grid iteration matrix of a two-level method, given in Equation (33), defined using the block time integrators, Φ and Φ . To derive an upper bound on the 2 -norm of Δ , we first consider a Fourier ansatz in space. More precisely, we use the Fourier matrix, , of discretized Fourier modes of a basis that accounts for the × -coupling within Φ and Φ and, possibly, different nodal coordinates. Note that is a square matrix of dimension equal to that of Φ and Φ . We then reorder the transformed block matrix,  −1 Δ  , where  = +1 ⊗ , from + 1 × + 1 blocks of size × , where = dim(Φ), to a block-diagonal matrix with × blocks of size ( + 1) × ( + 1). For the two-level algorithm with -relaxation, for example, we obtain and . Note that the spatial Fourier symbols,Φ and Φ , , = 1, … , , are dense block matrices of size × . Therefore, for each = 1, … , , we simultaneously diagonalize the Fourier symbols,Φ andΦ , , using the eigenvector matrix, , ofΦ , Finally, we reorder the transformed block-Toeplitz matrix with diagonal blocks,  −1 ̃ Δ,  , where  = +1 ⊗ , to a block-diagonal matrix with Toeplitz blocks. For a two-level method with F-relaxation, we obtain Using the norm computations from Section 4.3.1, we obtain the following Lemma 1. Let Φ and Φ be × block time-stepping operators, and letΦ andΦ , , = 1, … , , where = dim(Φ) = dim(Φ ) be the corresponding spatial Fourier symbols. Furthermore, assume that for all = 1, … , ,Φ andΦ , , can be simultaneously diagonalized with eigenvalues , =1,…, and , =1,…, , respectively, and that | , | ≠ 1 for all = 1, … , , = 1, … , . Then, the 2 -norm of the coarse-grid iteration matrices, Δ and Δ , of the two-level methods satisfy and where ( ) denotes the condition number of .
Proof. Let denote the Fourier matrix that transforms Φ and Φ into block-diagonal form with blocks given by the spatial Fourier symbols, Φ and Φ , . Furthermore, let  denote the permutation that reorders the Fourier-transformed block matrix,  −1 Δ  , where  = +1 ⊗ , into global block-diagonal structure with × Toeplitz blocks,̃ Δ, , given in Equation (38) for the two-level method with F-relaxation. Both applied transformations are unitary transformations and, thus, The bound for the two-level method with F-relaxation can then be derived as where is the transformation defined in Equation (39), for = 1, … , , and  −1  −1 ̃ Δ,   = diag( Δ, , ) =1,…, , with permutation  that reorders  −1 ̃ Δ,  into block-diagonal structure with Toeplitz-blocks, Δ, , . The bound for the coarse-grid iteration matrix of the two-level method with -relaxation is derived analogously.

Blending SAMA and reduction analysis
The closeness of SAMA and the two-level reduction analysis motivates applying ideas from SAMA in the two-level reduction analysis and vice versa. In one direction, we can easily make use of the bound commonly used in reduction analysis to improve the computability of the SAMA prediction, in place of the singular value computation that was used in prior work 15 . This yields where is the iteration matrix using either F-or FCF-relaxation. Similarly, we can compute the SAMA bound using only the coarse representation of the propagators, with Δ or Δ . Both of these are explored below, in Section 5.1.
In the other direction, we can investigate the differences between the predictions made by reduction analysis on the coarse grid 17 or consider the full fine-grid iteration matrix as in SAMA. Here, it is easy to derive the corresponding bounds, of For multiple iterations, ≥ 2, these bounds becomẽ

NUMERICAL RESULTS
Numerical experiments presented in this section are organized in two parts: first, in Section 5.1, we compare and contrast the three mode analysis tools of Section 4. Secondly, Sections 5.2 and 5.3 are devoted to using appropriate tools for gaining insight into effects of model and algorithmic parameters on the convergence behavior of the methods presented in Section 3.

Comparing the three mode analysis tools
The three mode analysis tools differ, in particular, in the treatment of the time dimension. While space-time LFA uses a Fourier ansatz in space and time, SAMA couples LFA in space with algebraic computations in time, and the two-level reduction analysis considers error propagation only on the coarse time grid. In this section, we compare and contrast the predictions of the three methods for the two hyperbolic model problems, linear advection in one dimension and incompressible linear elasticity in two dimensions. Here, both problems are discretized on a space-time mesh of size 64 × 64 or 64 2 × 64, respectively, with spatial mesh size Δ = 1∕2 and time-step size Δ = 1∕10. For the linear advection problem, the flow speed is chosen to be = 1 and the material parameters of the elasticity problem are chosen to be = = 1; the influence of these model parameters on the convergence behavior is considered in Sections 5.2.1 and 5.3, respectively.
In all of the mode analysis tools, we consider a Fourier ansatz in space. Note that for the two-level reduction analysis, this ansatz saves the cost of a computationally expensive solution of an eigenvalue problem, and (for all cases) is equivalent to rigorously analysing the spatial problems with periodic boundary conditions. The results presented here sample the Fourier frequency, ∈ (− , ], or the Fourier frequency pair, = ( , ) ∈ (− , ] 2 , respectively, on a discrete mesh with spacing ℎ = ∕32. For the space-time LFA predictions, the temporal Fourier base frequency, 0 ∈ (− ∕ , ∕ ], is additionally sampled on a discrete mesh with spacing ℎ = ∕32. The impact of finer Fourier frequency meshes was negligible in the examples considered here. Figure 4 shows the error reduction factors, LFA , SAMA , and RA , defined in Equations (30), (32), and (36)-(37), respectively, for the first 10 two-level iterations with F-and FCF-relaxation with factor-2 temporal coarsening applied to linear advection and linear elasticity. Note that the error reduction factors, LFA and SAMA , of space-time LFA and of SAMA are based on measuring the 2 -norm of powers of the full iteration matrices, i. e., LFA and SAMA are upper bounds for the worst-case error reduction at all time points, whereas the error reduction factor, RA , of the two-level reduction analysis provides bounds of the 2 -norm of powers of the coarse-grid iteration matrices, i. e., upper bounds for the worst-case error reduction only at C-points. Results show that LFA only predicts the initial convergence behavior, while both SAMA and RA also enable good predictions of short-term and long-term convergence behavior, covering the superlinear convergence of the methods, the effect of non-normality in early iterations, and the exactness property of the algorithms. Furthermore, the results demonstrate the difference in predictivity of SAMA and RA for the scalar case and the systems case. While in the scalar case of linear advection, predictions of SAMA and RA are close, in the systems case of linear elasticity, the prediction of RA is pessimistic due to the necessary condition number factor involved in the bound of the 2 -norm of the operators.

FIGURE 4
Error reduction factors predicted by space-time LFA, SAMA, and RA for the first 10 two-level iterations with F-and FCF-relaxation and factor-2 coarsening applied to the linear advection problem (left) and to the linear elasticity problem (right).

Combining SAMA and reduction analysis techniques
The above results show that, in the systems setting, SAMA provides better predictions than RA. However, when computing exact Euclidean operator norms of matrices with size equal to the number of time steps multiplied by the dimension of the spatial LFA symbol, SAMA becomes prohibitively expensive for large numbers of time steps. On the other hand, computing bounds instead of exact Euclidean operator norms makes the reduction analysis computationally tractable. Moreover, considering the iteration matrices only on the coarse grid instead of the full iteration matrices further reduces the cost of RA. Therefore, in this section, we explore combining the SAMA and RA approaches. Again, both model problems are discretized on a space-time mesh of size 64 × 64 or 64 2 × 64, respectively, with spatial mesh size Δ = 1∕2 and time-step size Δ = 1∕10. Furthermore, spatial Fourier frequencies are sampled on a discrete mesh with spacing ℎ = ∕32 in both dimensions.
In Figures 5 and 6, we explore the effects of considering coarse-grid iteration matrices or full iteration matrices (labeled Cpts or full in the legends) and of computing bounds or exact Euclidean operator norms (labeled 2-norm bound or 2-norm in the legends of the figures). More precisely, in addition to the error reduction factors, SAMA and RA (solid lines in the figures), also considered in Section 5.1, Figures 5 and 6 show various variants of SAMA-and RA-predicted error reduction factors. In particular, for RA, we consider using full iteration matrices instead of only coarse-grid iteration matrices. For SAMA, three additional error reduction factors are shown: one error reduction factor based on full iteration matrices, but computing bounds of the Euclidean operator norms, and two error reduction factors based on coarse-grid iteration matrices, computing either exact Euclidean operator norms or their bounds. Results using F-and FCF-relaxation show similar relationships between the different variants of error reduction factors. The difference between considering coarse-grid iteration matrices or full iteration matrices is at most a factor of about 1.4 in all cases. Since for a temporal coarsening factor of this factor is given by √ 18 , for small coarsening factors, considering only coarse-grid iteration matrices gives good estimates of actual error reduction factors that may lie in between the full and the C-point bounds or below both bounds. Furthermore, when using the bound on the 2-norm instead of computing the exact 2-norm, for the advection problem, we observe that error reduction factors increase by a factor of at most 1.7 for linear advection, and by a factor of at most 3.7 in the case of linear elasticity. Thus, for the analysis of MGRIT applied to the linear elasticity probem, considering coarse-grid iteration matrices and using the bound on the 2-norm for the SAMA prediction gives a practical improvement over RA, with predicted error reduction factors being a factor of about five or ten, respectively, smaller than those of RA, and not relying on the (unsatisfied) assumption of unitary diagonalizability.

FIGURE 5
Linear advection: Various variants of SAMA-and RA-predicted error reduction factors for the first 10 two-level iterations with F-(left) and FCF-relaxation (right) and factor-2 temporal coarsening.

FIGURE 6
Linear elasticity: Various variants of SAMA-and RA-predicted error reduction factors for the first 10 two-level iterations with F-(left) and FCF-relaxation (right) and factor-2 temporal coarsening.

Predictivity of space-time LFA
Motivated by the observation that long time intervals are needed for LFA to be predictive for parabolic problems, in this section, we investigate whether similar observations apply to hyperbolic problems by comparing space-time LFA predictions with SAMA predictions for the linear advection problem on time intervals of increasing lengths. More precisely, the linear advection problem is discretized on a space-time mesh of size 64× using various numbers of time steps, , with a spatial mesh size of Δ = 1∕2 and with fixed time-step size Δ = 1∕10; again, a flow speed of = 1 and factor-2 temporal coarsening are considered. We compute the error reduction factors LFA and SAMA for the first 20 iterations of the two-level methods. The slopes of the best-fit lines of these error reduction factors as a function of the iterations can be used to compute the predicted average error reductions per iteration, LFA and SAMA , respectively. Figure 7 shows these predicted average error reduction factors (left) and the difference, | LFA − SAMA |, of LFA-and SAMA-predicted average error reductions (right) as functions of increasing numbers of time steps, . Results show that the difference between LFA and SAMA predictions decreases with increasing numbers of time steps. More precisely, while for = 128, LFA-predicted average error reduction factors using F-and FCFrelaxation are about 23% or 90%, respectively, larger than SAMA predictions, for = 1024, LFA predictions are only about 2% or 6%, respectively, larger. Thus, results suggest that space-time LFA is a feasible option for predicting the convergence behavior for large numbers of time steps, usually corresponding to long time intervals as = Δ . Note that for hyperbolic problems, considering long time intervals may be interesting in practice, in contrast to the case of parabolic problems for which LFA becomes predictive for time intervals that generally are longer than the diffusion time scale.

FIGURE 7
Difference between SAMA and space-time LFA for linear advection. Shown are average error reduction factors measured over the first 20 iterations (left) and the difference between LFA-and SAMA-predicted average error reduction factors (right) for increasing numbers of time steps.
Interestingly, in results not shown here, we see that space-time LFA is much more predictive of behaviour over the first few MGRIT iterations than it is over later iterations. As a concrete example, in Figure 4, we see that the space-time LFA and SAMA predictions agree nearly perfectly for the first iteration, but diverge after this. When averaging over the first 10 iterations, in place of the first 20 used in Figure 7, we see improvements of about a factor of two in the differences in average error reduction predicted compared to the results shown in Figure 7. These results are offset by correspondingly higher errors in iterations 11-20, where SAMA accurately predicts improvements in convergence factors that are missed by LFA.

Investigating MGRIT convergence for linear advection
So far, we have only presented results for flow speed = 1 and two-level cycling with factor-2 temporal coarsening. In this section, we investigate the effects of other wave speeds, of multilevel cycling, and of other coarsening factors on convergence. In particular, we are interested in answering the question of which frequencies cause slow convergence when MGRIT performance degrades.

Influence of model parameters
The discrete advection problem (2) depends on the factor ∶= ( Δ )∕Δ , which can be seen as an effective CFL number, relating the flow speed and the discretization parameters Δ and Δ . To determine the effect of on convergence, we consider predicted error reduction for varying flow speeds , while keeping the discretization parameters Δ and Δ fixed. As SAMA provides the most accurate predictions among the three analysis tools and allows insights into error reduction as a function of spatial modes, in this section, we only consider SAMA. Figure 8 shows the error reduction factor SAMA for the first 20 two-level iterations with F-and FCF-relaxation and factor-2 temporal coarsening applied to linear advection with flow speeds = 2 and = 80 (corresponding to = .4 and = 16), respectively, discretized on a space-time mesh of size 64 × 512 with spatial mesh size Δ = 1∕2 and time-step size Δ = 1∕10. The right plot in Figure 8 details the error reduction for the first iteration, showing the SAMA-predicted error reduction as a function of the spatial Fourier modes. Convergence with both relaxation strategies is similar, with convergence degrading with increasing effective CFL. While for a larger effective CFL the error is reduced better for most spatial modes, the error reduction is worse for low frequency modes, indicating that the coarse-grid correction process is less effective.

FIGURE 8
Error reduction for varying wave speeds. At left, worst-case error reduction and at right, error reduction per spatial Fourier mode for the first iteration.

Effects of multilevel cycling and coarsening
In Parareal and MGRIT algorithms, computations on the coarsest temporal grid are sequential. Larger temporal coarsening factors reduce the size of the coarsest-grid problem and, thus, the cost of computations. On the other hand, large coarsening factors increase the cost of relaxation. To determine the effect of different coarsening strategies on convergence, we consider two-and three-level MGRIT variants applied to the advection problem discretized on a 64 × 256 space-time grid with spatial mesh size Δ = 1∕2 and time-step size Δ = 1∕10. The left plot in Figure 9 shows SAMA-predicted error reduction factors for two-level iterations with F-and FCF-relaxation with factor-2 and with factor-4 temporal coarsening. Increasing the coarsening factor leads to slower convergence, especially for F-relaxation. When considering the predicted average error reduction, SAMA , over the first 10 iterations, average error reduction per iteration degrades from 0.13 to 0.31 for F-relaxation and from 0.11 to 0.24 for FCF-relaxation. Note that considering a three-level method with coarsening factors and 2 leads to the same number of time points on the coarsest grid as a two-level method with a coarsening factor of 2 . Therefore, we compare convergence of two-level iterations with F-and FCF-relaxation with factor-4 coarsening with convergence of three-level schemes using factor-2 coarsening in both coarsening steps. The right plot in Figure 9 shows error reduction factors for the two-level methods and for three-level V-and F-cycle methods. Neglecting cycle costs, V-cycles lead to slower convergence than the corresponding two-level schemes, while F-cycles converge faster than the two-level schemes. However, taking into account that three-level F-cycles are twice as expensive as three-level V-cycles, the convergence rates of three-level V-cycles are better than those of three-level F-cycles, with an average error reduction over the first 10 iterations of three-level F-cycles with F-or FCF-relaxation of about 0.15 and 0.12, respectively, compared to squared average error reduction rates of three-level V-cycles with F-or FCFrelaxation of about 0.11 and 0.08, respectively. Note that the two-level results in Figure 9 with = 4 and FCF-relaxation come close to achieving convergence due to the exactness property, since the coarse-grid problem has only 64 temporal meshpoints, so we expect exact convergence in 32 two-level iterations. This may be the underlying reason for the prominent difference in convergence between F-and FCF-relaxation over the later iterations.

Investigating MGRIT convergence for linear elasticity
Performing similar experiments for the linear elasticity problem as for the linear advection problem in the previous section is computationally challenging. Additionally to the spatial Fourier symbol being a block matrix as a result of the system structure of the problem and of the mixed finite-element discretization, we consider Fourier frequency pairs on a two-dimensional tensorproduct mesh instead of on a one-dimensional mesh. Results in Section 5.1 show that LFA may not be a feasible option, and predictions of RA are pessimistic due to the necessary condition number factor in the computations. SAMA gives a practical improvement over RA, which is also true for SAMA when computing bounds instead of exact Euclidean operator norms. We therefore focus on computing the SAMA bounds given in Equation (42) when analyzing MGRIT convergence.
One remaining drawback of the modified SAMA analysis is that it still becomes expensive for large numbers of time steps. When considering larger coarsening factors, however, we need to increase the number of time steps due to the exactness property of MGRIT. To make the SAMA analysis computationally tractable for larger numbers of time steps, we sample spatial Fourier

FIGURE 9
Error reduction for two-and three-level variants applied to advection. At left, two-level methods with factor-2 and factor-4 temporal coarsening and at right, two-and three-level variants with same coarsest grid size.
frequencies on a discrete mesh with spacing ℎ = ∕16. Figure 10 shows that with this mesh spacing, SAMA captures the dominant convergence behavior and shows details in the spectral Fourier domain that are not visible with a coarser mesh spacing of ℎ = ∕8, and are not significantly different when considering a finer mesh spacing of ℎ = ∕32.

FIGURE 10
Error reduction factors predicted by SAMA using various mesh spacings, ℎ , for sampling the spatial Fourier frequencies for two-level MGRIT with factor-8 temporal coarsening applied to the linear elasticity problem discretized on a 32 2 × 256 space-time mesh. At left, worst-case error reduction and at right, error reduction per spatial Fourier mode for the fifth iteration. The top row of figures at right correspond to results with F-relaxation, while the bottom row shows results with FCF-relaxation; the columns at right correspond to ℎ = ∕8 (left), ℎ = ∕16 (middle), and ℎ = ∕32 (right).
In Figure 11, we look at the effects of the coarsening factor on error reduction. The left-hand side of the figure plots error reduction factors for the first 16 iterations of two-level MGRIT with factor-2 and with factor-8 temporal coarsening. While using a coarsening factor of two leads to good convergence behavior of both methods, we see divergence for factor-8 temporal coarsening in the first iterations. Note that the exactness property drives convergence of the iteration with FCF-relaxation and factor-8 coarsening at later iterations. At the right of Figure 11, we plot error reduction factors of the tenth iteration for both schemes (top and bottom rows) and both coarsening strategies (left and right columns) as functions of the Fourier frequency pair ( , ). Note that the overall "structure" of these plots is quite similar across the different algorithmic parameters (although they are plotted on slightly different axes to best show individual details), with excellent convergence at frequencies both close to and far from the origin in frequency space, and worst convergence achieved for a range of small (but not zero) frequencies.
This is quite possibly related to phase errors between the fine-and coarse-scale propagators 46 ; to what extent this mode analysis tool may provide insight into how to cure the poor convergence is a question left for future work.

FIGURE 11
Error reduction factors predicted by SAMA for two-level MGRIT with factor-2 and factor-8 temporal coarsening applied to the linear elasticity problem discretized on a 32 2 × 256 space-time mesh. At left, worst-case error reduction and at right, error reduction for the tenth iteration with F-relaxation and factor-2 (top left) and factor-8 coarsening (top right) and with FCF-relaxation and factor-2 (bottom left) and factor-8 (bottom right) coarsening as functions of the Fourier frequency, .
The discrete elasticity problem, given in Equations (13)- (15), depends on the material parameters and as well as on the discretization parameters Δ and Δ . To determine relevant parameter sets for convergence studies, we make the following observations: In (15), the scaling of Δ in the discrete divergence operator and of Δ clearly does not matter at all. In (14), the value of Δ matters as an absolute. For standard finite-element discretizations on uniform meshes in two spatial dimensions (as we consider here), the mass matrix can be written as a scaling factor of (Δ ) 2 times an operator that is independent of Δ , while the entries in the stiffness matrix are independent of Δ . Thus, we can rescale (13) by dividing through by Δ 2 , and rescaling to obtain (1∕Δ 2 ) + (Δ 2 ∕ Δ 2 ) + (1∕Δ ) ̃ = (1∕Δ 2 ) −1 − (Δ ∕ Δ 2 ) −1 . Since the scaling on the two terms involving differs by a factor of Δ , there are two natural parameters to consider: ∶= Δ ∕ Δ 2 and Δ . To perform a thorough set of experiments with these parameters with a reasonable computational complexity, we fix = 128. The left plot in Figure 12 shows that the choice = 128 is reasonable for experimenting with the parameters in the system since this choice ensures that convergence is not affected by the exactness property of MGRIT in the first few iterations. Furthermore, aside from effects of the exactness property for small , performance of the two-level methods does not depend on the number of time points. Note that varying the number of time steps changes the length of the time interval. This length can also be controlled by varying the time-step size (on a uniform-in-time mesh as we consider here). In the right plot of Figure 12, we consider three different time-step sizes Δ for fixed = 128 and = = = 1. Results show that as Δ decreases, there is an increasing initial jump in the worst-case error, but asymptotically, convergence appears to be independent of the timestep size. Furthermore, convergence of the two-level methods with -and with -relaxation is similar. After 10 iterations, -relaxation gives about a factor of six improvement in error reduction over -relaxation at twice the cost per itertation. To gain insight into effects of model parameters on convergence, we consider the parameter, , factored in the form = (Δ ∕Δ 2 )( ∕ ), i. e., we group discretization and material parameters, and we study its effect on convergence of the two-level methods. More precisely, we look at convergence for fixed by simultaneously varying Δ and Δ 2 or and , respectively, and at error reduction for varying by fixing three of the four parameters and varying the remaining one. Figure 13 shows effects of the material parameters on error reduction when the discretization parameters Δ and Δ are held constant. The left-hand side of the figure plots error reduction factors for fixed ratio ∕ = 1, i. e., simultaneously varying and (corresponding to fixed ), while at the right, error reduction factors are shown for various ratios ∕ for fixed = 1 (corresponding to varying ). The plots show that performance of both two-level schemes only depends on the ratio ∕ . In the limit of large ∕ , convergence is extremely fast, especially for the two-level scheme with -relaxation, and unsteady for -relaxation. This behavior is not surprising as this limit corresponds to a stiff material and, thus, the solution is approaching zero, i. e., oscillations are rapidly damped.

FIGURE 13
Error reduction for two-level methods with factor-2 temporal coarsening applied to elasticity discretized on a 32 2 × 128 space-time grid with mesh sizes Δ = 1∕2 and Δ = 1∕4 for various material parameters and . At left, we simultaneously vary and (corresponding to fixed parameter = (Δ ∕Δ 2 )( ∕ ) = 1) and at right, we vary the material parameter and fix = 1 (corresponding to varying ).
The ratio ∕ allows us to determine effects of the parameter on error reduction for large variations in . Using the slopes of the best-fit lines as a function of the iterations, we compute the average error reduction per iteration. The right-hand side of Figure 14 shows the average error reduction over iterations two thru 10 as a function of the parameter for Δ = 1∕4, Δ = 1∕2, and = 1 fixed, and various values of . In all cases, average error reduction is bounded by 0.5, showing good and robust convergence in a reasonable parameter regime. At the left of Figure 14, we plot average error reduction as a function of the time-step size for fixed parameter = 16. Results demonstrate that convergence does not change until we get to large time steps, again, indicating good and robust convergence.

FIGURE 14
Average error reduction over iterations two thru 10 for two-level methods with factor-2 temporal coarsening applied to elasticity discretized on a 32 2 × 128 space-time grid. At left, average error reduction as a function of Δ for fixed = 16 and at right, we vary the material parameter and fix Δ = 1∕4, Δ = 1∕2, = 1, and show the average error reduction as a function of = (Δ ∕Δ 2 )( ∕ ).

CONCLUSIONS
Currently, a key challenge in the development of parallel-in-time algorithms is achieving scalable algorithmic performance for hyperbolic PDEs. While some insight has been gained by both trial-and-error and more systematic computational studies in recent years, predictive analytical tools to aid in this development have been lacking in the literature. Here, we examine and extend mode analysis approaches, long used in the spatial multigrid community, to examine performance of methods from the Parareal/MGRIT class for two hyperbolic model problems. Our extensions lead to tighter bounds on performance that can be computed more efficiently than those existing in the literature. When applied to these model problems, we gain some insight into what poses essential challenges in developing algorithms for hyperbolic PDEs, and what parameter regimes, both physical and computational, are more or less difficult to handle. A key observation is that, while the FCF-relaxation that is typical in MGRIT is generally more effective than the F-relaxation typical in Parareal, the differences between these approaches in a twolevel setting is generally not substantial; when FCF-relaxation works best, F-relaxation has good performance, too, and when F-relaxation is ineffective, FCF-relaxation is not a magic cure (unless the exactness property becomes significant). While this work does not immediately give direction as to how to improve algorithmic performance, it offers predictive tools that can be used to identify and diagnose convergence difficulties and may help in designing or optimizing improved algorithms.

A SPATIAL FOURIER SYMBOLS FOR ELASTICITY OPERATOR
The Fourier symbols of the 2 mass and stiffness matrices using nodal basis functions can be computed using tensor products, ,