A high‐order WENO‐limited finite‐volume algorithm for atmospheric flow using the ADER‐differential transform time discretization

A high‐order‐accurate weighted essentially non‐oscillatory (WENO) limited upwind finite‐volume scheme is detailed for the compressible, nonhydrostatic, inviscid Euler equations using an arbitrary derivatives (ADER) time‐stepping scheme based on differential transforms (DTs). A second‐order‐accurate alternating Strang dimensional splitting is compared against multidimensional simulation with 2D transport using solid body rotation of various data. The two were found to give nearly identical accuracy in orthogonal, Cartesian coordinates. Orders of convergence are demonstrated at up to ninth‐order accuracy with 2D transport. 1D transport is used to confirm that error decreases monotonically with increasing order of accuracy with WENO limiting even for discontinuous data. Further, WENO limiting always decreased the error compared with simulation without limiting in the L1 norm. A series of standard 2D compressible nonhydrostatic Euler equation test cases were validated against previous results from literature. Finally, it was demonstrated that increasing the order of accuracy led to better resolved features and increased power for kinetic energy at small wavelengths.


Oscillation limiting
Instabilities often develop while numerically integrating the Euler equations from accumulation of energy at small wavelengths. This is related to "Runge" oscillations that develop in the presence of discontinuities with polynomial approximations of higher than first-order accuracy. Even if the simulation is stable without explicit dissipation, oscillations will still develop, which can contaminate certain added physical approximations such as microphysics and chemistry Lauritzen et al. (2012). Discontinuities need to be limited to remove oscillations while preferably still keeping the gradients as steep as possible. Element-based limiting typically treats all DOFs in an element together at once. There are variable second-order viscosity approaches Yu et al. (2015), higher-order "hyper"-viscosity approaches Skamarock and Klemp (2008); Dennis et al. (2012), optimization-based approaches that limit element extrema over a time step to that of the neighborhood Guba et al. (2014), WENO-based limiters Qiu and Shu (2004); Guo et al. (2016), modal high-frequency limiters Michoski et al. (2016), and flux-correction approaches Löhner et al. (1987); Kuzmin et al. (2012). Most if not all element-based limiting requires added parallel communication compared with the underlying scheme.
With FV schemes in particular, since a halo has already been transferred for reconstruction, that same halo can also be used for high-resolution limiting techniques that include essentially non-oscillatory (ENO) Shu and Osher (1988), weighted ENO (WENO) Liu et al. (1994), piecewise parabolic method (PPM) Colella and Woodward (1984), and non-polynomial recovery operators Xiao et al. (2002); Norman and Nair (2008). Other FV limiting mechanisms typically require additional parallel communication, including flux correction, viscosity, and hyperviscosity. This study uses WENO limiting.

Grid staggering
Grid staggering places different variables on cell centers, faces, edges, and corners (with implied directionality where appropriate) Skamarock et al. (2012); Arakawa and Lamb (1981). Certain aspects of staggering have also been generalized using differential forms in compatible finite-element techniques Natale et al. (2016) to exactly mimic certain properties of the underlying PDEs in a discretized setting. Notably, mimetic techniques can conserve quantities that are not in the vector of conserved variables. In the present study, a co-located grid ("A"-grid) is used. Many atmospheric discretizations rely on staggering to achieve better stability. The upwind finite volumes used here have enough inherent dissipation to achieve stability without explicitly added dissipation.

Dimensional splitting
Dimensional splitting, often expressed as a formal operator splitting, is the act of treating each dimension separately, and it can be done in different ways. One method is to apply a traditional operator splitting technique such as Strang splitting, which computes x, y, and z-direction tendencies in certain sequences, using updates from the previous direction before computing tendencies in the next MacNamara and Strang (2016). Another approach is to compute tendencies in all directions completely separately using the same forcing, and to couple them within each stage of a multistage operator, relying on the stages to generate cross terms automatically in the discretization Katta et al. (2015). There also exist splitting schemes that couple cheap nonconservative Lagrangian updates with more expensive flux-form updates (e.g., Leonard et al. (1996)). Dimensional splitting for finite-volume methods is desirable computationally because it typically reduces the computational complexity of the scheme by a factor of N (D−1) , where N is the order of accuracy of the scheme, and D is the number of spatial dimensions. Most splittings will suffice in an orthogonal coordinate system as shown in section 3.2.1, but in nonorthogonal coordinates, a straightforward Strang splitting gives very poor accuracy Norman and Nair (2018); Chen et al. (2017), though coupling dimensions through a multistage time integrator is still accurate in nonorthogonal coordinates Katta et al. (2015). However, for ADER-DT, there is no existing implementation that uses stages. For this reason, ADER-DT as currently implemented must be fully multidimensional in nonorthogonal coordinates as demonstrated in Norman and Nair (2018). However, since the present study only considers Cartesian coordinates, a second-order-accurate Strang splitting is used, and the errors associated with this approach are investigated in section 3.2.1.

Temporal discretizations
Unlike time-implicit discretizations Barros et al. (1995); Wood et al. (2014); Evans et al. (2019), time-explicit discretizations Zängl et al. (2015); Dennis et al. (2012); Skamarock and Klemp (2008) do not set up global dependence within each time step. Therefore, time-explicit time steps will be limited by stability ultimately because of the need to resolve finite propagation of information. Some applications require time steps even smaller than stability dictates to control errors. Time-explicit discretizations are generally categorized as multistep, multistage, or fully discrete Durran (2010); Norman et al. (2011a). The semi-discrete schemes (e.g., multistage or multistep) treat the spatially discretized (yet continuous in time) ODE with an ODE solver such as Runge-Kutta or Adams-Bashforth (or other varieties as well). Multistage and multistep methods must be strong stability preserving (SSP) to avoid having the temporal discretization itself lead to undesired new extrema Gottlieb (2005). Fully discrete schemes operate by propagating known spatial information into the temporal domain, the high-order generalization of which will either be Lagrangian Staniforth and Côté (1991) or arbitrary derivative (ADER) in nature Titarev and Toro (2002); Toro (2013). Lagrangian methods use characteristic trajectories of hyperbolic waves to link temporal and spatial information. ADER methods use the PDE definition itself, which directly relates spatial and temporal variation through derivatives. For instance, with linear uniform advection, u t = −u x , if we know the first spatial derivative, we can compute the first temporal derivative. By further differentiating in space and time, higher-order time derivatives can be computed from higher-order spatial derivatives. For a more in-depth description, please see section 2.5.
This study uses a time-explicit ADER scheme. Efficiently implemented, ADER methods have the desirable computational complexity of  ( N D+2 ) for the complete space-time operator Norman and Nair (2018).

Goal of the proposed method
The main hypothesis being explored in this study is that high-order accuracy, when properly limited, can provide increasing resolution even for complex and discontinuous flows. Here, we look at a chaotic nonlinear context with the Euler equations. The emphasis of this scheme is improving the accuracy per DOF for realistic flows and using a large effective CFL value (compared with existing time-explicit solvers).

New contributions of this study
This study is the first application of the ADER-DT time discretization to the 3D Euler equations. It is the first application of a new, optimized function-based WENO limiting scheme to the 3D Euler equations as well. The author is not aware of another publication that details the temporal-only convergence of an ADER algorithm by using a temporal order of accuracy much less than the spatial order to force temporal error to dominate the total error. Also, this paper performs an investigation into the errors associated with dimensional splitting in Cartesian geometry with an "A"-grid or collocated grid. This study also uses a new time-averaging of the flux time derivatives before the Riemann solver to reduce the number of Riemann solves and the volume of MPI data transfers in an ADER time discretization.

Algorithmic summary
For each dimension, the numerical method presented in this paper is summarized in Algorithm 1, where N is the spatial order of accuracy and N T is the temporal order of accuracy. Details of each step in the algorithm will be expanded throughout the rest of this section.

Inviscid Euler equations
The 3D compressible nonhydrostatic Euler equations, though inviscid, form the basis of atmospheric simulation and a good testbed for prospective numerical methods. The equation set describing this flow is: where is density, u, v, and w are winds in the x-, y-, and z-directions, respectively, is potential temperature related to temperature, T, by = T(P 0 ∕P) R d ∕c p , P 0 = 10 5 Pa, g = 9.8 m ⋅ s −2 is acceleration due to gravity, and p 0 = 10 5 Pa. The nature of the constant, C 0 , is to provide a convenient proportionality between p and ( ) in the equation of state. The hydrostatic density and hydrostatic potential temperature are related to hydrostatic pressure through the following relation: Equation (1) can be cast in vector form as t q + x f + y g + z h = s, where bold-faced variables represent vector quantities. This can be done with a nonsquare matrix-vector product.

8
-Compute N T − 1 time derivatives of the flux and state at each of the N T spatial GLL points using differential transforms of the PDE using Algorithm 4 (see section 2.5.3).

9
-Integrate state and flux time derivatives into a time-averaged state and flux using Algorithm 5.

10
-Store cell-edge-time averages of the state and flux vectors into large arrays for all cell edges in this MPI task.

11
-Parallel communication: Exchange flux and state cell-edge estimates with neighboring MPI tasks.

12
for each cell interface in this direction do 13 -Compute a single-valued numerical flux from discontinuous estimates from adjacent cells using the linear upwind Godunov state using Algorithm 2. 14 -Store single-valued cell-edge fluxes into large arrays for all cell edges in this MPI task.

15
for each cell do 16 -Update the fluid state using the time-averaged tendency for the cell as shown below: x-direction: ∕Δz + Δtŝ k wheref is the time-averaged flux, q is the cell-averaged fluid state, andŝ is the timeaveraged and cell-averaged source term.

Dimensional splitting
Fully multidimensional FV requires N D − 1 times more computations than dimensionally split FV. While a dimensionally split simulation can use a CFL value of one, a fully multidimensional simulation must use a smaller time step for stability. These two situations motivate the use of dimensional splitting where possible. A second-order-accurate alternating Strang splitting is used in this study, which reverses the order of the dimensional solves for each time step. If the update in a given direction can be cast as q n+1 = RHS (q n ), where n is the time step index, then the update proceeds as follows: To be more precise, the RHS x , RHS y , and RHS z terms are describing the full finite-volume updates in each of those directions, respectively, described in Equation (7). The same time step of Δt is used in all RHS updates.
For the dynamics solve, the equations in the x-, y-, and z-directions are: t q + x f = 0, t q + y g = 0, and t q + z h = s, respectively. For sake of simplicity, in the rest of this article each will be described in general as except for the z-direction, s ≡ 0.

Finite-volume semi-discretized equations
For the Finite-volume discretization, the 1D Equation (6) is integrated over a 1D local cell domain, , where x i±1∕2 = x i ± Δx i ∕2, and Δx i is the grid spacing for the cell of index i. This gives rise to the following semi-discretized equations: where q (x, t) dx andf i (x, t), ands i (x, t) are spatial reconstructions of the vectors f and s, respectively, over the ith cell domain, Ω i , (the time dimension is left continuous for now), and  ( f − , f + ) is a Riemann solver to reconcile the multivalued flux at a cell interface due to samples from disparate reconstructions at the same location in space (Algorithm 2). Equation (7) forms the basis of step 15 of Algorithm 1.

Spatial recovery operator
Nth-order-accurate FV methods recover intra-cell variation by projecting a stencil of N surrounding cell averages onto a polynomial of order N − 1. In the present case, and for reasons that will be more fully discussed, the stencil is projected onto N T spatial GLL points that span the domain Ω i , where N T is the temporal order of accuracy. The polynomial coefficients are computed using the constraints of Equation (12), and the resulting polynomial is sampled at the GLL points. The vertical profiles of density and potential temperature are dominated by hydrostatic balance, upon which relevant motions result from relatively small perturbations. Even small deviations from hydrostasis will lead to significant changes in the flow. Because of this and the fact that hydrostasis is not polynomial in nature, directly reconstructing hydrostatic variables with polynomials often leads to spurious motions that contaminate the simulation. Hydrostasis is defined by: where the H subscript denotes the hydrostatically balanced profile that depends only on the vertical dimension. Therefore, before reconstruction, hydrostasis is removed from and : ′ (x, y, z, t) = (x, y, z, t) − H (z) and ′ (x, y, z, t) = (x, y, z, t) − H (z). After the perturbations are projected onto GLL points, hydrostasis (which has been precomputed at GLL points) is added back: GLL,j = ′ GLL,j + H,GLL,j and GLL,j = ′ GLL,j + H,GLL,j , where j is the GLL point index within the cell.

Linear upwind Riemann solver
This corresponds to step 12 in Algorithm 1. For the dynamics, a characteristics-based flux vector splitting is used to compute an upwind flux using linear characteristics from a locally "frozen" background state. Consider a PDE: t q + x f = 0 with a flux Jacobian: A = f/ q. Left-multiplying the PDE by the flux Jacobian and assuming that it is locally constant in space and uniform in time gives t f + A x f = 0. The flux Jacobian can then be diagonalized into A = RΛR −1 , where R is a matrix whose columns are right eigenvectors, and Λ is a diagonal matrix whose diagonal components are eigenvalues, p . Using this diagonalization gives t f + RΛR −1 x f = 0. Left-multiplying by R −1 gives t w + Λ x w = 0, where w = R −1 f is the vector of five flux-based characteristic variables that each travel at a finite speed of p . The diagonalizations for each direction are given in Appendix A.
To compute the locally frozen flux vector, the linear average at the interface is used:q = ( q − + q + ) ∕2. Then, the locally frozen eigenmatrices,R,Λ, andL =R −1 are built on the averaged state,q. Next, the flux-based characteristic variable vector, w upw , is computed as: wherêj is the jth row ofL, j is the jth diagonal component ofΛ, and is a small tolerance to handle characteristic speeds very close to zero. Finally, the upwind flux is obtained as  ( f − , f + ) =Rw upw . The Riemann solver for a single cell interface is detailed in Algorithm 2.

Weighted essentially non-oscillatory (WENO) limiting
Weighted essentially non-oscillatory (WENO) limiting corresponds to step 5 in Algorithm 1. WENO interpolation uses multiple lower-order-accurate candidate polynomials, p L, i (x), each constrained using a sub-stencil of the data within a stencil of N values (for an Nth-order-accurate scheme). WENO then calculates the total variation (TV), see Equation (14) below, of each lower-order polynomial to determine how much it varies over the domain in question. Then, WENO weights the polynomials with the largest TV lower so that the weighted combination of polynomials is smoother. In regions of smooth flow, WENO performs little to no limiting. WENO limiting is depicted visually in Figure 1 with a discontinuity at x = 1/2. In this figure, a fifth-order-accurate polynomial is formed using five cell averages, and three quadratic polynomials, p L, j (x), are formed as well. The resulting polynomial is less oscillatory than the high-order polynomial.
In WENO interpolation, there is a set of "ideal" weights, i , such that a weighted combination of lower-order interpolations using these weights will give the full high-order interpolation. Traditionally, WENO schemes are applied only to pointwise spatial w U,j = (w R,j + w L,j )∕2 11 Compute the right eigenmatrix, R I , using q I . 12 Compute the upwind flux as f U = R I w U (matrix multiplication) approximations of the solution (e.g., at a cell edge in finite-volume schemes) Liu et al. (1994). In these schemes, the polynomial coefficients (which require a matrix-vector product) are never actually computed. Rather, a simpler dot product against stencil values gives the interpolation for each polynomial at a single point, x i . Next, a linear system is formed such that a weighted combination of lower-order interpolations at x i gives the high-order interpolation at x i . This system is solved to determine the ideal weights, and therefore, the ideal weights are determined by the interpolants and the grid geometry. This approach is not directly usable by ADER schemes because ADER needs higher-order variation in space so that it can translate that variation into the temporal domain via the PDE's relation of space and time (see section 2.5 for more information). Thus, for use with ADER, we need the actual polynomial coefficients. This study adopts the function-based WENO approach F I G U R E 1 Example of WENO limiting, assuming a grid spacing of 1, where P L (x), P C (x), and P R (x) are the left-biased, centered, and right-biased lower-ordered polynomials, respectively; and P H (x) and P W (x) are the high-order and WENO polynomials over the center cell's domain, respectively. Thick black lines denote the stencil average values for this example. Note the domains of P L , P C , and P R all overlap in the center cell. Note this figure also appears in Norman and Larkin (2020) [Colour figure can be viewed at wileyonlinelibrary.com] of Norman and Nair (2018), which is similar in nature to the WENO implementations in (Capdeville, 2008a;2008b;Norman, 2015b). In this approach, all of the polynomial coefficients are explicitly formed, and the WENO-limited polynomial is a weighted combination of the lower-order polynomials. The WENO-limited polynomial is valid over the center cell domain of N stencil cell averages.
In this study, for an Nth-order-accurate WENO reconstruction (where N is an odd integer), the lower-order polynomials are constrained according to Equation (11): where h = (N − 1) ∕2 is the size of the stencil "halo," q i is the average of the quantity being reconstructed in cell i, and i is the index of the stencil's center cell. What this describes is a moving window within the stencil to constrain the lower-order polynomials (see Figure 1 as an example). Next, the high-order-accurate polynomial is constrained according to Equation (12): Once these are computed, a "bridge" polynomial, p L, B (x), is created such that in a weighted combination with the lower-order polynomials, we retrieve the high-order polynomial as shown in Equation (13): While the bridge polynomial has higher-order derivatives, its accuracy is still lower-order. In this case, any set of ideal weights, j , will produce the high-order polynomial. Therefore, the ideal weights are free parameters that can be optimized to achieve desired properties in the resulting WENO interpolation. The only constraints placed on the ideal weights is that they be spatially symmetric, all positive, and that they sum to one. Larger weights for the bridge polynomial and the lower-order polynomials near the stencil center will result in a more oscillatory WENO interpolation.
Next, the total variation of each lower-order polynomials is computed according to Equation (14): The TV estimate is the integral of the squared derivative summed across all available derivatives and normalized by the grid spacing. The TV of the bridge polynomial has higher-order derivatives, and it is natural to assume it will be inherently larger than the other TV estimates. Therefore, a handicapping parameter, ∈ (0, 1], is introduced to reduce TV B to a lower value closer to the average of the other TV estimates. This is shown in Equation (15): The smaller the value of , the closer TV B becomes to the other TV estimates, and the more oscillatory the WENO interpolation becomes. The value for is also a free parameter that can be optimized to achieve desired properties in the WENO interpolation.
Next, the WENO weights are computed as the inverse square of the TV estimates so that the lower-order polynomials with the most variation are weighted the lowest. This is described by Equation (16): where is a small number used to avoid division by zero. This code uses = 10 −20 . Even in single precision (32-bit IEEE floating point), the smallest number that can be represented is 10 −38 , and this choice of has not caused any problems in single precision in any of the experiments. Next, the WENO-limited weights are normalized into a convex set of weights according to Equation (17): From (16) and (17), it is evident that if all low-order polynomials have the same variation (i.e., the flow is "smooth"), the ideal weights are recovered. Finally, the WENO-limited interpolation is given by Equation (18) below, where the only difference from Equation (13) is that the WENO-limited weights are used in place of the ideal weights.
The WENO limiting process is detailed in Algorithm 3. It was mentioned earlier that the ideal weights, j , and the bridge TV handicapping factor, , are free parameters that can be optimized for desired properties in the WENO interpolation. Therefore, a grid search optimization was performed on these parameters at each order of accuracy to minimize the L 1 error for one revolution of a "square wave" about the domain in uniform 1D transport (see section 3.1). Though the optimization did not explicitly penalize oscillations and overshoots, it turned out that optimizing for the lowest error implicitly also reduced oscillations significantly. The resulting ideal weights of  (12).
3 Compute the coefficients of the "bridge" polynomial, p L,B (x) using Equation (13).
this optimization are given in Table 1 along with the maximum overshoot in the 1D transport experiment with square wave data. The table shows that in all cases, optimizing for minimal L 1 error also happened to reduce the maximum overshoot in the simulations to values 5 × 10 −3 or lower for all orders of accuracy. This is a stark reduction in overshoot magnitude compared with the unlimited cases, which all had oscillations on the order of 0.1 or larger.

ADER-DT time discretization
This section corresponds to step 7 in Algorithm 1.

The ADER approach
The ADER time discretization Titarev and Toro (2002); Toro and Titarev (2005); Toro and Titarev (2006) can be thought of as a higher-order extension of the Lax-Wendroff Roe (1984) approach. Typical fluid . The maximum overshoot magnitude is also given for this optimization experiment with and without WENO limiting.
dynamics PDEs relate spatial derivatives of the fluid state to the temporal derivative of the fluid state. Consider 1D homogeneous transport with uniform wind: q t = −q x . If the first spatial derivative is known, the first temporal derivative is known as well. By differentiating to successively higher-order derivatives in space and time, any order of temporal derivative can be computed if that same order of spatial derivative is also known: q xt = −q xx , q tt = −q xt , etc. This process is described in literature as the Cauchy-Kowalewski (C-K) procedure, and it demonstrates that the temporal order of accuracy of an ADER method matches that of the spatial order of accuracy. One advantage of ADER time stepping is that it inherently maintains non-oscillatory properties of a limited spatial reconstruction because temporal information is computed directly from the spatial information. While the C-K method is straightforward to implement with symbolic mathematical software such as Maple or Sagemath (even exported directly to C or Fortran), in practice, this method is very expensive-especially compared with other realizations of the ADER approach to time discretization.
Other improvements to the ADER methodology include the Galerkin-in-time realization Balsara et al. (2009);Dumbser et al. (2008), which iterates the coefficients of time and space-time basis functions, given an initial knowledge of the spatial-only coefficients. These methods require fast transformations between modal and nodal space, and they also require iteratively solving nonlinear algebraic equations that are different for different orders of accuracy and different geometries. This is an efficient way to implement an ADER scheme, especially compared with the C-K procedure, and it has been implemented for both finite-volume and Galerkin spatial operators.

ADER with differential transforms
Another implementation called ADER-DT relies on differential transforms (DTs) to transform the PDE into a recurrence relation to generate higher-order time and space-time derivatives Norman and Finkel (2012); Norman (2013); Norman (2014). The temporal polynomial coefficients can be integrated to a time average and inserted into the spatial operator as if it were semi-discrete, requiring no changes to the original semi-discretized operator. The downside to the implementation in Norman and Finkel (2012) , which is incredibly expensive, compared with the expected efficient computational complexity of a space-time operator of  ( N D+2 ) . In Norman and Nair (2018), however, the ADER-DT approach was significantly improved by only computing DTs in the temporal dimension on a set of GLL points in space. This study uses the improved implementation in Norman and Nair (2018) expanded to the Euler equations.
The kth differential transform of a function, f , is the kth term in the Taylor series of f . The inverse transform is the Taylor series itself.
Using these two definitions, the rules to transform various nonlinear operators can be derived, and transform rules for most common operators can be found in Tsukanov and Hill (2000).
First, the spatial reconstruction procedure transforms a stencil of N cell averages of the model state, q, into N T GLL points, where N and N T are the spatial and temporal orders of accuracy, respectively. Next, the PDE system (6) is transformed into modal derivative space only in the temporal dimension using DTs, leaving the spatial dimension in nodal space on GLL points.

Temporal differential transforms of the Euler equations
Since the DTs are only being performed in time for a given GLL point, the spatial dimensions are neglected for simplicity, and the transformed PDEs become in the x-, y-, and z-directions, respectively: where a capital letter denotes the transformed function in time, and k t denotes the (k t )th-order DT in time, which is also the (k t )th term in the temporal Taylor series. Before computing the DTs of the flux and source terms, the following auxiliary functions are defined: ij (t) = q i (t) q j (t) ∕q 1 (t) and p ⋆ (t) = q 5 (t) . The symmetry ij = ji can be used to reduce computations. Given these functions, the following are the temporal DTs of the flux and source vectors in the Euler equations.
The flux terms, q i q j /q 1 , have the following differential transforms: Pressure has the following differential transform described in two steps in Equations (26 and 27), where P ⋆ is the differential transform of the expression p ⋆ = ( ) = q 5 (t) : These functions are initialized as follows: The summations used to compute the DTs for Φ ij (k t ) and P ⋆ (k t ) include the actual values Φ ij (k t ) and P ⋆ (k t ) themselves, even though they are not technically defined by that point. In these cases when a function's DT depends on itself, it is assumed to be zero in the summation. This can be implemented in practice by initializing each of these functions to zero before entering the loops to compute the DTs (see steps 1-7 in Algorithm 4).

Implementation specifics
To compute the spatial derivative of F (k t ), an N T × N T matrix,  D , is created that transforms the N T GLL points over the cell into N T polynomial coefficients, differentiates the polynomial, and then samples the differentiated polynomial at the same N T GLL points across the cell. This can be done with a single matrix-vector multiplication.

Algorithmic simplifications
Traditionally, ADER methods perform a Riemann solve not only for the flux (using more traditional solvers such as HLLC Toro et al. (1994)), but also for the temporal derivatives as well. This sets up derivative Riemann problems that are typically solved using the locally constant linear upwind state. In this implementation, however, the algorithm is simplified by computing a single, traditional Riemann solve on the time-averaged state. This has two advantages over traditional ADER approaches: (a) only one Riemann solve needs to be performed per cell edge per time step, and (b) less data needs to be communicated between parallel nodes. The errors associated with this simplification are investigated in section 3.3. This corresponds to step 8 in Algorithm 1.

Algorithm for computing time derivatives with DTs in the x direction
Algorithm 4 details how to compute the time derivatives of the state and fluxes efficiently in the x direction. The y and z directions are analogous. Local arrays of size (ngll,ngll) are needed, where the first index is for the modal time DT and the second index is for the nodal spatial GLL point) for the following variables:

Boundary conditions
In all test cases in this study, the vertical boundaries are assumed to be no-slip solid walls, and the horizontal boundaries are assumed to be periodic. Since perturbations are stored for and , the ghost cells for the solid wall are the same as the nearest physical domain value (mimicking a zero gradient) for everything but the normal (vertical) velocity, which is set to zero in the ghost cells. When GLL points of the state and flux are computed, the vertical velocity coinciding with the vertical boundaries is set to zero to enforce a no-flux boundary condition. Setting the vertical wind to zero in the vertical ghost cells acts to help the recovery operators near the vertical boundaries to approach zero naturally so that setting the actual boundary samples to zero does not create oscillations. The Straka density current test case is quite Algorithm 4: Compute time derivatives for the state and flux terms. This assumes zero-based indexing.

17
Compute rt_gamma(kt+1,s) using Equation (27) substituting kt+1 for k t in that equation, because we are computing the DT at order k t + 1.

2D option
Since the vast majority of test cases are in literature in only two spatial dimensions, the code can switch to 2D mode with a few simple modifications. First, only one cell is used in the y dimension, and the y location is assumed to be the middle of the y direction domain. Next, the v velocity is initialized to zero. Finally, the y-direction update in the dimensionally split time-stepping is not performed. Therefore, the state vector never feels the y direction forcing, and v remains zero for all time.

NUMERICAL TEST CASES AND VALIDATION
This study uses literature standard test cases to validate the expected behavior of the nonhydrostatic model. All experiments use an initial CFL value of 0.8 unless otherwise specified and use a constant time step throughout the simulation. All experiments use a temporal order of accuracy that matches the spatial order of accuracy unless otherwise indicated. Only the convergence studies are performed in double precision. All other simulations are performed in single precision. All error norms are computed at the final time, and they are defined as follows:

1D transport
High-order convergence will be demonstrated in section 3.2 in a 2D transport setting with smooth data.
However, in realistic atmospheric simulations, the data are rarely if ever smooth enough for high-order convergence to be observed in practice. It begs the question as to why one should bother with high-order accuracy. The answer is that even though high-order convergence is not observed, the constant on the error term still goes down with higher-order accuracy, particularly if suitable limiting is used. To demonstrate that the solution improves even with highly discontinuous data, a "square" wave, a "triangle" wave, and a cosine bell are advected in a 1D transport setting with uniform winds. The governing PDE is t q + x q = 0 on a domain of [0, 1], and q is initialized as follows: Figure 2 plots the initial data and final solution after one revolution about the domain with 50 cells and a CFL of 0.8 with ADER time stepping and WENO limiting over a variety of orders of accuracy. There is also a zoom of the square wave at different orders of accuracy along with a comparison of ninth-order accuracy with and without WENO limiting. The most notable aspect is that the jump in resolution from third order to fifth order is quite large, which seems to suggest that fifth order might be considered a minimum desired order of accuracy for a simulation with discontinuities and WENO limiting. Also, note that the resolution of the square wave's jump increases monotonically with increasing order, as does the resolution of the triangle wave. The cosine bell is a bit smoother, and as expected, all orders of accuracy greater than third order resolve it quite well. Note that with only 50 cells, even the cosine bell is not all that smooth as it is expressed on the coarse grid, but visual comparison is clearer at coarser resolution. Finally, note the significant reduction in oscillations at ninth-order accuracy with WENO limiting. Table 2 gives a more quantitative view of this test case with 100 cells and the same CFL value of 0.8. In all cases, the L 1 error is lower with WENO limiting than without. The WENO L ∞ error is better than the unlimited L ∞ error at third and fifth orders of accuracy but worse at seventh and ninth orders of accuracy. This suggests that WENO experiences some degradation in the steepness of discontinuities at the higher orders of accuracy relative to unlimited reconstruction. As the order of accuracy is increased, WENO limiting improves the error monotonically for all error norms. While unlimited reconstruction improves errors monotonically with increasing order for the L 2 and L ∞ errors, this is not the case in the L 1 error, which increases from seventh to ninth order. This demonstrates that with WENO limiting, high-order accuracy still reduces the error, even if the error is not converging in a high-order fashion. Finally, without limiting, the overshoots were all  ( 10 −1 ) , but with WENO limiting, the overshoots were  ([ 10 −5 , 10 −3 ]) , depending on the order of accuracy.

Dimensionally split versus fully multidimensional
Since this method uses dimensional splitting, it is worthwhile to investigate how this affects the accuracy in a context that easily allows exact error norms: 2D transport. To get an idea of how the second-order-accurate alternating Strang splitting affects the accuracy, solid body rotation (SBR) of a cone, a cosine bell, and a so-called "slotted cylinder" is performed for one full rotation with a multidimensional approach and a dimensionally split approach. In general, the multidimensional simulation must use a CFL value less than 1/2 while the dimensionally split can use a CFL value up to one. The domain is assumed to be [0, 1] × [0, 1] subdivided uniquely and equally by 100 × 100 equally spaced finite-volume cells in Cartesian coordinates.
The 2D transport PDEs are: TA B L E 2 Error norms for 1D transport of a square wave, a triangle wave, and a cosine bell with 100 cells They are solved in the same manner as the 3D Euler solver, with the upwind Riemann solver being defined as using the upwind value according to the wind velocities at an interface. For 2D WENO reconstruction, this is performed in a tensored manner for fully multidimensional reconstruction. SBR winds do not change in time, and therefore, the temporal differential transform is simple and linear, for example: For solid body rotation, the winds are initialized as u = − (y − 0.5) and v = (x − 0.5), where = 2 . The tracer field is initialized, as shown in Figure 3, with a linear cone ( 1 ), a cosine bell ( 2 ), and a slotted cylinder ( 3 ). They are defined as follows: where x C, 1 = 0.50, y C, 1 = 0.25, x R, 1 = y R, 1 = 0.15, x C, 2 = 0.25, y C, 2 = 0.50, x R, 2 = y R, 2 = 0.2, x C, 3 = 0.50, y C, 3 = 0.75, and x R, 3 = y R, 3 = 0.15. For this test, ninth-order accuracy in space and time is used to ensure the operators are as accurate as they can be to best highlight any dimensional splitting errors in time.

TA B L E 3 Error norms for a 2D transport test case with
ADER-DT time stepping, an upwind finite-volume spatial discretization, and solid body rotation of a slotted cylinder, a cone, and a cosine hill for one rotation about the domain with ninth-order accuracy in space and time with and without a second-order alternating dimensional splitting at varying CFL values using 100 × 100 cells with and without WENO limiting Note: T"DS" means dimensionally split, and "MD" means multidimensional. Table 3 shows error norms for solid body rotation of a cosine bell, cone, and slotted cylinder using 100 × 100 cells at varying CFL values with and without WENO limiting for a fully multidimensional discretization and a dimensionally split discretization. When both are run at their largest stable CFL values, the dimensionally split simulation actually performs slightly better, presumably because there are fewer spatial reconstructions required with the larger CFL value. When run at the same CFL value, the error norms are nearly identical. A plot of the solution with ninth-order accuracy and WENO limiting is given in Figure 3. Notice that small oscillations develop at the zero contour. None of these is greater than order 10 −2 . One interesting aspect of WENO limiting is that the algorithm sometimes senses that an extremum is "smooth". This happens inside the slot of the cylinder and at the top as well, where the number of grid points is small enough that the extremum looks smooth. A positivity filter will fix any negative values, and the extremum at the top of the cylinder is smooth without 2Δx oscillations. With more grid cells, the top of the cylinder will be better resolved, and the smooth extremum will go away as it does in the rest of the cylinder. This demonstrates that, in Cartesian geometry, dimensional splitting is introducing negligible error into the overall scheme over a range of CFL values up to the maximum CFL value. This is not necessarily sufficient to guarantee the same is true for the Euler equations, but it gives confidence toward that expectation. It is more difficult to judge this in the full Euler equations without any means of providing exact error norms. It is also not sufficient to demonstrate that the temporal error convergence is higher than second order. For that demonstration, please see the next section.

High-order convergence
To demonstrate high-order convergence for a sufficiently smooth function, 2D transport with uniform winds, u = v = 1, of a sine wave profile, q (x, t = 0) = (sin (2 x) sin (2 y) + 1) ∕2, is performed with a CFL value of 0.8 in all simulations with ADER-DT and WENO limiting at varying orders of accuracy and grid spacings. The data are revolved once about the domain. The results of this experiment are given in Table 4, giving the L 1 norms and orders of convergence. The dimensionally split ADER-DT WENO-limited scheme achieves the desired order of accuracy for each order. It was mentioned earlier that 3.2.1 was not sufficient to demonstrate that the temporal error is converging at high-order accuracy. Rather, it only demonstrates that dimensional splitting error does not dominate the overall error over a range of CFL values. Table 5 performs the same experiment performed in Table 4 except with a fixed spatial order of accuracy of ninth order with varying temporal orders of accuracy. This is done so that temporal error is more likely to dominate the overall error. Indeed, in Table 5, with a temporal order of accuracy significantly lower than the spatial order, the rate of convergence is exactly what is expected-that is, second, third, fourth, and fifth orders in time converge at exactly second, third, fourth, and fifth orders of accuracy, respectively. This demonstrates that, even though the dimensional splitting is second-order accurate, the constant on the splitting error term is low enough in Cartesian geometry that it does not show up in the overall temporal error. Further, Table 5 also demonstrates that at sixth-order accuracy in time, the spatial error begins to dominate, and the overall error converges at ninth order. Thus, with ninth-order accuracy in space, no benefit is gained from temporal orders of accuracy greater than six.

1D smooth Euler simulation to investigate flux time-averaging error
The algorithm in this paper also performs a time-averaging of the flux vector time derivatives before performing the Riemann solver. This introduces error because the integral of a nonlinear function is in general not the same as the nonlinear function of the integral. Due to the nonlinearity, the two operators do not commute without incurring error. TA B L E 4 L 1 errors and orders of convergence ("ord") for 2D transport of a sine wave with uniform winds and a CFL value of 0.8 at varying numbers of cells and orders of accuracy In more precise terms: To investigate the magnitude and nature of this error, the 1D Euler equations, Equation (39), are discretized with the same discretization used in this study: where p = C 0 ( ) . For this test, smooth initial data are specified as follows: In essence, , u, and p have a 20% sinusoidal perturbation with varying phases, each added to a mean value of one. The fluid behavior begins in a smooth fashion. However, similar to Burger's equation with a sine wave initialization, shocks eventually develop at around t = 0.93. Figure 4 shows contour plots of the time evolution TA B L E 5 L 1 errors and orders of convergence ("ord") for 2D transport of a sine wave with uniform winds and a CFL value of 0.8 at varying numbers of cells and temporal orders of accuracy of this density, u, and pressure up to time t = 1 with 1,000 cells at ninth-order accuracy with ADER-DT time stepping and WENO limiting as a grid-converged reference solution.

cells 50 cells 60 cells
The linearized upwind Godunov state Riemann solver uses the following flux Jacobian diagonalization and is otherwise treated exactly the same as the 3D Euler model (see section 2.3.4): To gauge the impact of averaging before the Riemann solver is applied rather than after, this model is implemented in two ways: (a) integrating the time derivatives of the fluxes before calling the Riemann solver (labeled as "flux-averaged"), and (b) performing the Riemann solver on all time derivatives of the flux and then integrating the time derivatives of the tendencies (labeled as "tendency-averaged"). The tendency-averaged approach does not incur the error that that flux-averaged approach does and should theoretically be the more accurate discretization of the underlying PDEs.
To compare the flux-averaged and tendency-averaged approaches, Tables 6 and 7 give the errors and convergence rates for both approaches using 100 and 200 cells, with a maximum CFL value of 0.8, at two different times: t = 0.2 (a very smooth solution) and t = 0.9 (just before the fluid develops shocks). This gives a sense of the behavior for smooth and nonsmooth solutions. In all cases, the differences between the two approaches are never more than 1%. There are cases where the flux-averaging approach is ≤1% more accurate, and there are cases where the tendency-averaging approach is ≤1% more accurate. Also, for the smoother solution at t = 0.2, the numerical methods all achieve the desired convergence rates in the L 1 error norm for the full nonlinear equations.

2D nonhydrostatic Euler test cases
For the 2D nonhydrostatic Euler test cases, only the fifth-order-accurate 2D version of the model with WENO limiting is used to validate the model against expected literature results (e.g., Norman et al. (2011b) and references therein). Section 4 will investigate the effects of higher-order accuracy and WENO limiting in the context of 2D nonhydrostatic Euler flow.

Hydrostatic initialization for Euler simulations
To initialize a hydrostatic atmosphere, N-point Gauss-Legendre quadrature is used to initialize cell means for an Nth-order-accurate method to ensure the atmosphere is initially balanced. For cell boundaries, exact estimates of the hydrostatic background state are sampled. Hydrostasis is a function of the vertical dimension only.

Neutral atmosphere
To initialize hydrostatic balance for a neutral atmosphere with constant potential temperature, 0 , it is easiest to obtain a vertical profile for Exner pressure, , rather than pressure directly. Exner pressure is a function of pressure Note: The baseline for error norms is a grid-converged solution at ninth-order accuracy using 1,000 cells averaged to a 100-or 200-cell grid for comparison.

TA B L E 7 Errors and convergence
rates for density with the smooth 1D Euler test case at time t = 0.9 using ADER-DT time stepping and a maximum CFL value of 0.8 only, given by: And its hydrostatic balance equation is given in terms of only potential temperature: The first two test cases assume a constant potential temperature basic state. Therefore, the hydrostatically balanced Exner pressure profile is trivial: where it is assumed that sfc = 1 meaning p = p 0 at the surface.

Constant Brunt-Vaisala frequency
To initialize a stable atmosphere, a constant Brunt-Vaisala frequency, N 0 , is specified. The Brunt-Vaisala frequency is given in terms of fractional vertical gradient of potential temperature: Therefore, Plugging this into (42), one eventually obtains the following vertical profile: The constants are set as follows: sfc = 1, sfc = 300 K, and N 0 = 10 −2 s −1 .
Pressure can be obtained from (43) or (46) by the Exner pressure equation, (41). Then, density is determined from the pressure by the equation of state.

Rising thermal
The rising thermal test case Wicker and Skamarock (1998) perturbs the potential temperature of a neutrally stratified atmosphere as follows: wherê= 2 K is the amplitude of the perturbation, (x 0 , y 0 , z 0 ) = (10, 10, 2 km) is the center of the thermal, and x R = y R = z R = 2 km is the radius of the thermal. This test case is simulated on a domain of [0, 20] × [0, 20] × [0, 10] km in the x, y, and z directions for 1,000 s. Winds are initially at rest. For a 2D domain, the y coordinate is always set to y 0 . Plots of the solution are given in Figure 5 using a fifth-order-accurate WENO ADER-DT scheme with 100 m grid spacing and a CFL value of 0.8 giving a time step of 0.23 s. To compare against literature results, please see Figure 5 of Ahmad and Lindeman (2007) and Figure 2 of Norman et al. (2011b). Quantitative comparison is difficult since it is a nonlinear flow regime wherein small initial condition differences and numerical differences eventually lead to large differences in the flow. However, the structure matches other studies well. Figure 6 also plots

Straka density current
The Straka density current test case Straka et al. (1993) perturbs the potential temperature of a neutrally stratified atmosphere as follows: wherê= −15 K is the amplitude of the perturbation, (x 0 , y 0 , z 0 ) = (26.5, 26.5, 3 km) is the center of the cold bubble, and x R = y R = 4 km and z R = 2 km are the radii of the cold bubble in the horizontal and vertical directions, respectively. This test case is simulated on a domain of [0, 53] × [0, 53] × [0, 6.4] km in the x, y, and z directions for 900 s. For a 2D domain, the y coordinate is always set to y 0 . This test case is traditionally run with a dynamic viscosity, which is defined by adding the following to the source term vector in Equation (??): where K = 75 m 2 ⋅ s −1 . However, it can be run without viscosity as well. Winds are initially at rest. The test run here

Inertia-gravity waves
The inertia-gravity wave test case Skamarock and Klemp (1994) perturbs the potential temperature of a stably stratified atmosphere (with N 0 = 10 −2 ) as follows: 52) wherê= 10 −2 K is the amplitude of the perturbation, (x 0 , y 0 ) = (100,100 km) is the center of the perturbation, and h R = 5 km is the horizontal radius. This test case is simulated on a domain of [0,300] × [0,300] × [0, 10] km in the x, y, and z directions for 3,000 s. For a 2D domain, the y coordinate is always set to y 0 . The horizontal winds are initialized to u = v = 20 m ⋅ s −1 (where v is set to zero in 2D simulations). Typically, this test case is solved with the horizontal to vertical grid spacing ration set to 10:1. Plots of the potential temperature are given in Figure 8 after 3,000 s using 100 m and 1 km grid spacing in the vertical and horizontal directions, respectively. With a CFL value of 0.8, the time step was 0.23 s. To compare against previous literature solutions for this test case, please see figs. 1-4 of Ahmad and Lindeman (2007); fig. 2 of Giraldo and Restelli (2008); and figs. 7 and 8 of Norman et al. (2011b).
To keep these plots from being noisy, this particular simulation was performed in double precision, since the thermal perturbation has a relative magnitude of 10 −5 compared with the background potential temperature.

Colliding thermals
The colliding thermals test case to the author's knowledge has not been presented in literature before. It exists to create stronger discontinuities, stronger winds, and more significant turbulence than the other simulations. The colliding thermals test case perturbs the potential temperature of a neutrally stratified atmosphere as follows: wherê= 20 K is the amplitude of the perturbation, (x 1 , y 1 , z 1 ) = (10, 10, 2 km) is the center of the warm bubble, (x 2 , y 2 , z 2 ) = (10, 10, 8 km) is the center of the cold bubble, and r = 2 km is the radius of the bubbles. This test case is simulated on a domain of [0, 20] × [0, 20] × [0, 10] km in the x, y, and z directions for 700 s. Winds are initially at rest. For a 2D domain, the y coordinate is always set to y 0 .
Plots of potential temperature after 0, 200, 400, and 700 s are given in Figure 9 using 100 m grid spacing. With a CFL value of 0.8, the time step was 0.23 s.

DISCUSSION OF LIMITING AND ORDER OF ACCURACY
In this section, only odd orders of accuracy are used (i.e., third, fifth, seventh, and ninth order). This is because a cell-centered approach naturally contains an odd number of cells in a stencil to use for recovery. It would, similarly, be more natural for a cell-edge-centered approach (for direct flux recovery) to use an even-ordered accuracy. Also note that when this paper mentions "order of accuracy," there is no implication that the results for multidimensional Euler dynamics are actually converging to high-order accuracy. That would be impossible for any method because the underlying data are not smooth enough for well-posed high-order derivatives. Rather, the implication is that the recovery operators are high-order accurate if the data are smooth enough (see section 3.2). Further, the hypothesis being tested is that even though high-order convergence is impossible in realistic atmospheric flow, the constant of error can still be reduced via higher-order-accurate recovery operators that are appropriately limited with high-resolution WENO limiting (see section 3.1). Figure 10 shows a plot of potential temperature perturbation for the inertia-gravity waves test case at fifth-order accuracy with and without WENO limiting at z = 5 km. The overlapping lines demonstrate that for smooth flows there is no effect when applying the WENO limiter, a designed property of WENO limiting. This is expected because it was demonstrated in the 2D transport context in F I G U R E 10 Plot of potential temperature at z = 5 km with and without WENO limiting for the 2D inertia-gravity wave test case using 300 × 100 cells. This is to demonstrate that WENO does not affect the smooth solutions [Colour figure can be viewed at wileyonlinelibrary.com] section 3.2 in the order of convergence study. However, it is helpful to confirm that this is still true in the context of the multidimensional Euler equations. Figures 11 and 12 show contour plots and line plots at x = 10 km of potential temperature perturbations with and without WENO limiting after 270 s of simulation at ninth-order accuracy to visually show the effects of WENO limiting for flows that become discontinuous at the front of the colliding thermals. WENO limiting slightly reduces the extrema and significantly reduces the visible 2Δx oscillations. Yet, the larger scale flow remains undamped in the cross section plot.

4.2
Order of accuracy Figure 13 shows the vertical wind for the rising thermal test case over varying orders of accuracy from third to ninth order, all using WENO limiting. Notice the increasing detail in the thermal center in particular. Of course, there is a limit to how much order of accuracy can help, particularly for smoother flows, since they are by definition already fairly well resolved. The differences are visible up to seventh-order accuracy, but for this test case, the differences between seventh order and ninth order are minimal. Figure 14 shows plots of potential temperature for the collision test case after 700 s of simulation. Again, notice the increasing amount of visible detail at smaller scales as the order of accuracy increases. Since this is a chaotic fluid that is now in a turbulent regime, the results are expected to diverge because of different numerical properties. But the general scale of the turbulent eddies present in the potential temperature plot becomes finer with higher-order reconstruction. Still, it would be helpful to take a more quantitative approach to this. Figure 15 plots the kinetic energy spectral power density as a function of signal frequency for the 2D collision test case with 400 × 200 cells and a CFL value of 0.8 with ADER time stepping. Spectral power density (SPD) is a measure of the squared amplitude of a signal at different frequencies. SPD, , of a signal is defined as:  ( ) = |c ( )| 2 , where is the frequency, and c ( ) is the the Fourier coefficient for that frequency. Practically, it is computed by squaring the coefficients of an FFT (in python, this is spd=numpy.abs(numpy.fft.rfft (data[:]))**2). To create the plot, the SPDs using only x-direction FFTs are averaged over 100 time steps throughout the simulation and over the 200 vertical levels to create something smooth enough for easy visual comparison between different orders of accuracy and different limiting mechanisms. Further, a moving average of SPD over four frequencies is used to create the final plot. Kinetic energy is calculated as KE = ( u 2 + v 2 + w 2 ) ∕2. Note that a similar study is performed in Norman and Larkin (2020), but that plot was too noisy to draw robust conclusions. Here, the results are averaged over many time steps and vertical levels to produce something smooth enough for clear comparison. When looking at Figure 15, the wavelengths get smaller as the frequencies get larger, meaning the smallest wavelength admitted in the flow (2Δx) is at the farthest right in the plot. One would expect that if the higher-order-accurate simulations really are resolving finer-scale structures, then there will be more power in the kinetic energy at the smaller wavelengths. Note that while the k −3 slope is plotted, it is not clear that this type of flow must follow that particular spectral slope. Further, this simulation has a relatively small inertial range for the energy cascade. In 2D, to create an inertial range, energy must be injected at the smallest scale because it will cascade upscale, while, in 3D, energy must be injected at the largest scales. With the collision of thermals, energy is created at small scales, but it is not a sustained injection of kinetic energy.
The most obvious result from this experiment is that the power at small wavelengths (large frequencies) increases as the order of accuracy is increased. This seems to confirm that the higher-order accurate reconstructions are indeed resolving finer-scale structures in the flow. There is some overlap at seventh and ninth orders of accuracy between 4Δx and 8Δx grid spacings, but between 4Δx and 2Δx grid spacings, the ninth-order scheme clearly has more power. Also, for each case, the WENO limiting decreases power in the spectra after about 16Δx grid spacing. The loss in power associated with WENO limiting is quite small at 16Δx and gradually increases as the grid spacing increases. From this plot, it seems there is relatively little value in moving from seventh-order accuracy to ninth-order accuracy in the unlimited regime. However, when WENO limiting is turned on, there is much more to be gained from going to ninth-order accuracy. Clearly, WENO limiting is not necessary for stability because there is no appreciable energy accumulation at the 2Δx scale. Therefore, it would be worth looking into limiting only potential temperature and density with WENO and leaving the wind velocities unlimited.
In summary, this experiment clearly demonstrates that at higher orders of accuracy, there is greater power in

Performance on graphics processing units (GPUs)
While a detailed description of the code's C++ performance portable implementation and analysis of this code's performance on GPU accelerators is given in Norman and Larkin (2020), it is worth mentioning a few interesting computational findings. GPUs typically have memory bandwidth rates (the amount of data that can be transferred per second) that are much higher than those of CPUs. For this reason alone, most applications enjoy a speed-up on GPUs. However, GPUs are also capable of significantly more computations per second than CPUs. Low-order-accurate methods do not use GPUs efficiently because they do not perform many floating TA B L E 8 Number of cells updated per second with and without WENO limiting over a range of spatial and temporal orders of accuracy on a single V100 GPU using 80 × 80 × 40 cells using single precision floating point arithmetic point operations per byte of data fetched from memory. High-order-accurate methods and WENO limiting improve on this because they increase computations without significantly increasing data movement. This is true even for small GPU workloads (see section 5.1.4 of Norman and Larkin (2020)). Table 8 shows the number of cells updated per second for varying orders of accuracy in space and time using 80 × 80 × 40 cells on one V100 GPU using single-precision floating-point arithmetic. At third order of accuracy in time, without WENO limiting, all of the spatial orders of accuracy are within 5% of one another in terms of runtime. At third-order accuracy in time, the overhead of WENO limiting at third, fifth, seventh, and ninth-order accuracy is 6, 28, 37, and 77%, respectively. There is a notable difference in performance moving from third-order accuracy to fourth-order accuracy in time. This is actually due more to how the kernel is compiled by the Nvidia nvcc compiler (register usage and GPU occupancy) than it is due to the number of computations. At fourth-order accuracy in time, all experiments are within 3% of one another, regardless of order of spatial accuracy or limiting. It might seem unexpected for performance to modestly improve with increasing spatial order of accuracy at fourth-order accuracy in time. While this is common with GPU kernels, the details that lead to these unexpected anomalies will be investigated in a future study. Note that Table 8 is related to table 4 in Norman and Larkin (2020), but in this case, an improvement to the ADER-DT algorithm has been made, the result of which is detailed in Algorithm 4.
The ADER-DT scheme has a maximum stable CFL value of one, which is roughly 2× the size of the maximum effective SSP CFL value of most (but not all) existing strong stability preserving Runge-Kutta time integrators Gottlieb et al. (2011).
Please note that this is not an attempt at an overall comparison between numerical methods, as that would involve many more considerations. Also, the cells updated per second metric neglects accuracy, limiting of oscillations, and time-step size.

CONCLUSIONS AND FUTURE WORK
The hypothesis being explored in this study is that high-order accuracy, when properly limited, can provide increasing resolution even for complex and discontinuous flows. This is furthering the study of Norman and Nair (2018) that explores this hypothesis in the context of transport. A brief survey of common discretizations was given in section 1. The mathematical discretization was described in detail in section 2. In section 3, a number of aspects of the discretization were validated using standard test cases. The effects of increasing order of accuracy and WENO limiting were explored in section 4.
It was found in 1D transport that the new algorithm lowers error monotonically with increasing order of accuracy up to ninth order even with highly discontinuous data. Notably WENO gave lower errors than unlimited reconstruction in every experiment in the L 1 norm, though this is not true in the L ∞ norm. In 2D transport, solid body rotation of various data demonstrated that dimensional splitting errors are negligible in Cartesian geometry even at a CFL value of 0.8. Also in 2D transport, high-order convergence up to ninth order is confirmed by transporting a sine wave. It was also found in 2D transport that, when the temporal order of accuracy is much lower than the spatial order of accuracy, time error dominates the overall error, and the solution converges at the expected temporal order of accuracy. A 1D Euler model was used in smooth and nonsmooth contexts to demonstrate that averaging the flux time derivatives before the Riemann solver has negligible effects on accuracy.
Next, 2D test cases including a rising thermal, a density current, and inertia-gravity waves provide validation of the new algorithm with ADER time stepping and WENO limiting using previous studies of discretizations of this equation set. It was found that structures in the flow were better resolved with increasing order of accuracy even with WENO limiting enabled. Spectral power density plotted as a function of frequency for kinetic energy confirmed that higher-order accuracy had more power at smaller wavelengths than lower-order accuracy. Finally, WENO limiting begins to lower the kinetic energy's spectral power at a grid spacing of about 16Δx with effects that gradually increase up to the 2Δx scale.
Future directions planned for this research include investigating a multistage implementation of the ADER-DT algorithm in hopes of a simple dimensional splitting in nonorthogonal coordinates similar to Katta et al. (2015). The computational expense of an unsplit approach will also be investigated. Next, the inevitable expansion into global spatial scales will require implementing either HEVI or vertically sound-proof techniques, much of which has already been investigated in various contexts Weller et al. (2013); Gardner et al. (2018). This will also require further investigation into how to couple a diagonally implicit Runge-Kutta (DIRK) scheme Alexander (1977) with an ADER-DT explicit discretization, if the ADER approach proves possible and worthwhile in that context.