Separating Physics and Dynamics Grids for Improved Computational Efficiency in Spectral Element Earth System Models

Previous studies have shown that atmospheric models with a spectral element grid can benefit from putting physics calculations on a relatively coarse finite volume grid. Here we demonstrate an alternative high‐order, element‐based mapping approach used to implement a quasi‐equal‐area, finite volume physics grid in E3SM. Unlike similar methods, the new method in E3SM requires topology data purely local to each spectral element, which trivially allows for regional mesh refinement. Simulations with physics grids defined by 2 × 2, 3 × 3, and 4 × 4 divisions of each element are shown to verify that the alternative physics grid does not qualitatively alter the model solution. The model performance is substantially affected by the reduction of physics columns when using the 2 × 2 grid, which can increase the throughput of physics calculations by roughly 60%–120% depending on whether the computational resources are configured to maximize throughput or efficiency. A pair of regionally refined cases are also shown to highlight the refinement capability.

The quasi-equal-area finite volume physics grid (hereafter referred to as "physgrid") method presented by Herrington, Lauritzen, Taylor, et al. (2019) was partly motivated by the need to support a finite volume semi-Lagrangian tracer transport scheme, known as CSLAM (Lauritzen et al., 2017). CSLAM uses tensor-cubic interpolation to map data from the FV physics grid to the dynamics grid and requires a stencil that extends beyond the element boundaries. Herrington, Lauritzen, Taylor, et al. (2019) use a similar cubic interpolation that is smooth across element boundaries for their physgrid implementation. This is advantageous since the boundary exchange with the surface component models is on the same grid, but also carries the disadvantage that regional mesh refinement is much more complicated.
An alternative implementation of a finite volume physgrid was recently developed for use in the Energy Exascale Earth System Model (E3SM; Golaz et al., 2019, hereafter G19) that is designed to support regional mesh refinement of quasi-uniform grids. The part of the high-order finite-volume reconstruction that depends on a structured grid is local to a spectral element, and thus is independent of the topology of the spectral element grid.
The goal of this paper is to describe the physgrid implementation in the atmospheric component of E3SM and verify that it does not qualitatively affect the model behavior. The mapping technique is detailed in Section 2. Model experiments are described in Section 3 followed by analysis of the results in Section 4. Conclusions are discussed in Section 5.

High-Order, Property-Preserving Remap Operators
The E3SM atmospheric dynamical core is known as the High-Order Methods Modeling Environment (HOMME). The dynamics grid is built upon unstructured quadrilaterals of a cubed-sphere mesh with n p × n p Gauss-Lobatto-Legendre (GLL) quadrature points located at the roots of the basis set (Dennis et al., 2012;M. A. Taylor et al., 2007;M. A. Taylor & Fournier, 2010). A quasi-uniform cubed-sphere mesh is specified by the parameter n e ; this mesh has n e × n e spectral elements per cube face. A regionally refined mesh (RRM) retains the overall cubed-sphere structure but refines a region of the global domain to produce a locally fully unstructured mesh. While E3SM can be configured to run with any value of n p , all configurations use n p = 4, for which the basis functions are third-order polynomials (e.g., ne30np4 for the standard resolution configuration of E3SMv1). The evolution of the solution is communicated among elements via direct stiffness summation (DSS; Canuto et al., 2007) to form the global basis set, which leads to discontinuous derivatives at element boundaries (i.e., the solution is C 0 ).
The new physics grid divides each quadrilateral element into n f × n f finite-volume (FV) cells that are uniform on the element's reference coordinate and approximately uniform after mapping to the sphere. Since E3SM always uses n p = 4, in the case of a quasi-uniform cubed sphere mesh, an atmosphere grid set is specified with just n e and n f (e.g., ne30pg2). The default physics configuration, in which physics columns exist on the GLL nodes, continues to be referred to as "np4" (e.g., ne30np4).
This section describes our physgrid remap algorithms.

Definitions
The dynamics and physics mesh parameters are (n p , n f ). In this work, we assume n f ≤ n p . An element is a spectral element, and a subcell is a physics finite volume (FV) cell inside the element. Figure 1 illustrates an example of a spectral element. The blue square outlines one element, and the green filled circles mark GLL points. Each of the four red-dashed squares in Figure 1 is a subcell. We use "finite volume" in only a loose sense; the discretization shares with a fully FV discretization the characteristics of a constant basis function HANNAH ET AL.  over a subcell and a high-order reconstruction combining data from multiple subcells, but there is no flux function as in a fully FV method. An element has reference domain R; this domain is the full element outlined in blue in Figure 1. Subcell c in an element has reference domain  The PR method uses the HR method as a component. For each FV subcell in an element, the PR method reconstructs the field within the subcell by applying the HR method to a n f′ × n f′ panel of nearby subcells, where n f′ ≤ n f . The panel's GLL basis has number of nodes n p = n p' = n f′ , which we summarize with the single parameter n f′ . A subcell's panel is determined by centering it on the subcell, then shifting it by the minimal amount necessary in each direction to place the panel within the bounds of the element. This first step produces a reconstruction that is generally discontinuous at subcell edges. The reconstruction has order of accuracy n f′ for functions because it is the result of applying the HR operator.
Then the PR method applies the L 2 projection to project the subcell reconstructions onto the element's GLL n p -basis. The L 2 projection of a function y(r) over R is given by the solution of r r is the GLL n p -basis mass matrix. In the case of the PR method, y(r) is the function over the element resulting from the panel reconstructions of f. The L 2 projection preserves the order of accuracy of the panel reconstructions, giving an overall n f′ -order method. Figure 2 shows an example of panels for the case of n p = n f = 5, n f′ = 3. As in Figure 1, the blue solid line outlines the spectral element, the green filled circles mark GLL points, and the red dashed lines outline FV subcells. The black n f′ × n f′ panel provides reconstructions for a 2 × 2 grid of subcells in the southeast corner, HANNAH ET AL.
10.1029/2020MS002419 4 of 21 marked with block dots. The yellow panel provides reconstructions for the two left-most subcells along the center row, marked with yellow plus signs. Finally, the cyan panel provides the reconstruction for the middle cell, marked with cyan × signs.
First, every FV subcell has the same panel, and the panel covers the element. Second, the L 2 projection from the panel's GLL n f′ -basis to the element's GLL n p -basis is equivalent to interpolation by to the n p -basis nodes since n f′ = n p' and n f′ ≤ n p . Thus, in this case of n f′ = n f , the HR and PR methods produce the same GLL solution.
The PR method satisfies neither R1 nor R2 if n f′ < n f . First, R2 cannot hold because n f′ < n f . Second, the L 2 projection generally violates R1. However, conditions R1 and R2 still are used in deriving the panel reconstructions.

Implementation
The A p→f , A f→p , and operators are linear and thus are computed once at model initialization; during time stepping, applications of these operators require small matrix-vector products with matrices of size  2 2 f p n n and  2 2 p f n n . Integrals are computed by Gaussian quadrature within each FV subcell. Since our application already has GLL nodes and weights of various degrees, we use GLL quadrature. For an integrand having polynomial degree d, the GLL basis with ⌈(d + 3)/2⌉ nodes is sufficient for exact integration.

Element-Local Property Preservation
In the problem of property preservation, a property is a constraint that must hold nearly to machine precision, despite the approximations made in a discretization. We write "nearly" because in general the computation of the property has a condition number that, while small, is still larger than 1. For density and mixing ratio fields in the physgrid remap problem, the properties are respectively conservation of mass and nonviolation of the mixing ratio extrema in a discrete domain of dependence, or shape preservation. The second property also reestablishes tracer consistency if it does not already hold; if mixing ratio q is constant in one basis, then the lower and upper extrema are the same and so q is mapped to a constant in the other.
For a density field ρ, the mass conservation constraint in an element is Both B p→f and B f→p conserve mass. The fact that B p→f conserves mass combined with requirement R1and, in the case of the PR method, the additional fact that an L 2 projection conserves mass-imply B f→p conserves mass. B p→f conserves mass as follows: Because the B operators are linear and implement high-order remap, they cannot preserve shape in general. To preserve shape, we apply an element-local limiter after B. In each element, this limiter perturbs input mixing ratio q to produce output q′ such that mass is conserved, w J q w J q ; q′ is in bounds; and the difference diag(ρ)q′ −diag(ρ)q is 1-norm-minimal over all modifications that obey these HANNAH ET AL.
10.1029/2020MS002419 5 of 21 Figure 2. Schematic of the panels used in the panel reconstruction method, for an element with n p = n f = 5 and panels with n f′ = 3. As in Figure 1, the blue solid line outlines the spectral element, the green filled circles mark Gauss-Lobatto-Legendre points, and the red dashed lines outline finite volume subcells. Filled subcells of a particular color are associated with the 3 × 3 panel outlined by the same color.
constraints. It can be applied to a field in an element in either basis. There are a number of limiters that provide such outputs q′; we use ClipAndAssuredSum, Algorithm 3.1 in Bradley et al. (2019).
In the physgrid remap problems, the mass conservation and shape preservation constraints always form a non-empty set. The shape preservation constraint set can be written where s denotes a quantity on the source grid and t on the target grid. j ranges over the target degrees of freedom (DOF) within the element, and i ranges over the source DOF in the element as well as possibly neighbors. A constant mixing ratio over the element satisfies these constraints; thus, the constraint set is not empty.

Global Grids
Now we discuss the steps to remap quantities from one global grid to the other.
In HOMME, the total mass in a model layer is given by the pseudodensity times the vertical coordinate differential, where pseudodensity is the derivative of hydrostatic pressure with respect to the vertical coordinate; see M. A. Taylor et al. (2007) for details, in which the pseudodensity is written ∂p/∂η. For purposes of this discussion, which is independent of the details of HOMME, it is sufficient to refer to a general density ρ whose integral over a volume yields a mass. We refer to scalars s and corresponding scalar densities sρ, as well as scalar tendency Δs and scalar density tendency Δ(sρ).
Remapping a scalar from GLL to FV bases is purely an element-local operation. If the limiter will be applied, the bounds are the extremal scalar values of the GLL nodal values. First, the scalar density field is remapped. This same remap operator is applied to the total mass density; thus, if the scalar is a tracer mixing ratio, then it is mass-tracer consistent already. Second, the limiter is applied to the scalar to limit new extrema. Each of these two steps is local to the element.
In the FV-to-GLL direction, the temperature T f and velocity u f , v f data are tendencies. Tracer q f is typically a full state, but to avoid dissipating tracers by repeatedly remapping state, just the tracer tendency is remapped. The limiter is applied to tracers but not T f , u f , v f ; in what follows, the limiter steps are simply omitted for these. For concreteness, the following algorithm description refers to tracers in particular.
The algorithm proceeds as follows. To solve the correct property-preservation problem, limiter bounds must be for the full GLL tracer state. If the tracer density tendency is zero, the limiter should not alter the state. Thus, the bounds must include the extremal nodal values of q p prior to state update. First, within each element, the minimum and maximum (subsequently, extremal) FV mixing ratio state values are computed and stored. Second, in each element, the tracer density tendency is remapped with the linear, element-local, remap operator. Third, to permit OOA 2, a halo exchange shares the FV mixing ratio state extremal values among directly neighboring elements. After the exchange, the bounds for an element's mixing ratio state are the extremal values from the element and its neighbors. These are then augmented by the state's previous GLL extremal values in the element. Fourth, the limiter is applied to the mixing ratio values. Fifth, the DSS is applied. After the DSS, mixing ratio extrema within an element are bounded by the source data within a 2-halo of the element rather than the original 1-halo. The DSS is globally tracer mass conserving, but an element's tracer mass after the DSS is altered in general.
Returning to conditions R1 and R2 in subsection 2.2.2, neither condition holds in general throughout the full algorithm. Both property preservation and the DSS can violate each condition. These conditions were used just to derive linear, element-local, remap operators. The parameters are n p = 8, n f = 5. In all figure parameter labels, we omit "=" and ",", trading this space for a larger font size; e.g., "n p = 8, n f = 5" is written "n p 8 n f 5." The FV data are C 1 continuous on the right side of the domain and discontinuous on the left. For simplicity, the previous step's state is the constant blue dotted line, labeled as "(0) previous step," and the total mass density is uniformly one (not shown). The difference between the FV stair-step curve and this blue dotted line is then the tendency to remap.
In the top row, the linear, element-local, remap operator is applied to produce the dotted green curve labeled "(3) DGLL n p 8." Here, "D" in "DGLL" denotes discontinuous; at this stage, the GLL field is discontinuous across elements. Large green filled circles mark GLL nodes. Note that element boundary nodes are two-valued. The red curves labeled with a "(2)" show the intermediate quantity in each of the HR (left) and PR (right) methods. In the HR case, the intermediate quantity is the DGLL n p' = n f = 5 reconstruction. In the PR case, the intermediate quantity is the function formed by panel reconstructions with n f′ = 3.
Moving now to the second row, ClipAndAssuredSum (CAAS) is applied to the green dotted curve to produce the green dashed curve labeled "(4) DGLL." The right column uses two sets of arrows to show how the bounds are obtained. The green upward arrows show the elements whose FV state data are used to provide an initial set of bounds for the element pointed to by the downward green arrow. The blue downward arrow shows that this initial set of bounds is augmented by the extremal values in the previous GLL state of that element.
HANNAH ET AL.
10.1029/2020MS002419 7 of 21 Finally, the DSS is applied to the green dashed curve to produce the solid blue curve in the bottom row labeled "(5) CGLL." Here, the "C" denotes continuous; the GLL field is now continuous across elements. For reference, the GLL nodes are marked with red dots. Visually, the HR and PR methods reconstruct the smooth part of the tendency similarly, but the PR method produces lower-magnitude oscillations than the HR method across the discontinuity.

Remapped Quantities
Until now, this section has described remap algorithms general to any discretization involving spectral elements with embedded finite volume subcells. This subsection describes the quantities remapped in the E3SM atmosphere model.
Tracer mixing ratios are remapped using the methods described in subsections 2.2, 2.3, and 2.4.
Potential temperature density θρ is linearly remapped, in the GLL-to-FV direction as a full state and in the FV-to-GLL direction as a tendency, but no limiter is applied, consistent with its treatment in the dynamical core. Experiments show low sensitivity to the particular temperature variable type that is remapped. We choose potential temperature because it is conserved under dry adiabatic motion; that is, the integral of θρ in a Lagrangian parcel undergoing dry adiabatic motion is conserved. However, virtual potential temperature or just temperature could be used.
Velocities u, v are respectively the local-east and local-north velocity components. At and near the poles, these vary quickly in space and so will remap with more error than away from the poles. Instead of remapping these, we map the velocity vector to a reference-element coordinate system, remap each component in this coordinate system between grids-thus conserving the component's average value in this coordinate system-then map the result back to the local east-north system. Examples of element-local coordinate systems are contravariant, covariant, and 3D Cartesian. In our implementation, we use the contravariant coordinate system. A limiter is not applied to velocity, consistent with its treatment in the dynamical core.
There is no vertical velocity tendency, but the physics parameterizations need access to the vertical velocity ω. This quantity's state is remapped as a scalar from GLL to FV grids, conserving ω′s average value over a subcell, and no limiter is applied.
Topography data, geopotential  p s and  f s , are preprocessed and stored in a topography file.  p s are the traditional GLL ϕ s values; these are used in the dynamical core.  f s are obtained by remapping from GLL to FV and then applying the limiter; these are used in the physics parameterizations. Remap prior to application of the limiter preserves average elevation within a subcell; after, within an element. Elevation variance and other derived data used in physics parameterizations are computed offline using  f s and stored in the same topography file. See Lauritzen et al. (2015) for details on these derived topography data.
One detail specific to HOMME is the treatment of hydrostatic surface pressure p s , which is needed in the physics parameterizations, and the remap of pseudodensity, which is needed for remap of other quantities. A full description of these quantities can be found in M. A. Taylor et al. (2007). The three key points for our purposes are as follows. First, the integral of pseudodensity over a column gives the hydrostatic surface pressure p s . Second, given surface pressure, HOMME provides the vertical coordinate data necessary to compute pseudodensity over the column. Third, from the pseudodensity one can compute hydrostatic pressure at vertical column interfaces and midpoints. None of these three points involves remap. In both remap directions, we must simply remap p s p to obtain f s p , then use HOMME's existing vertical coordinate treatment to derive pseudodensity on the FV grid.

Numerical Verification
We verify our remap operators using a convergence study on a sequence of quasiuniform cubed-sphere grids. A tracer field is created on the GLL grid, remapped to the FV grid, and then remapped back to the GLL grid. Then the relative error between original and remapped fields is measured. We repeat this procedure HANNAH ET AL.

10.1029/2020MS002419
8 of 21 on a sequence of increasingly fine grids. The tracer fields are described in Lauritzen et al. (2012): Gaussian hills, cosine bells, and slotted cylinders. These differ in smoothness: C ∞ , C 1 , and discontinuous, respectively. The grid sequence is expressed in terms of n e : 5 to 320, increasing at each step by a factor of 2. We test a number of configurations. When we use the PR method, we set n f′ = min (n f , 3). Because the HR and PR methods are identical when n f′ = n f , we label those cases as "HR/PR" to indicate that either method would produce the result. Figure 4 shows results. The legend describes the line pattern, color, and marker for each algorithm. Triangles provide reference order of accuracy (OOA) slopes, with OOA in text in the triangle.
The green dotted line with × markers shows that on a sufficiently smooth field and with no limiter, the OOA of the HR method is n f as a result of condition R2. The blue solid line corresponds to our intended practical configuration: n f = 2 with limiter on for tracer fields. It achieves OOA 2 on a C 1 field. The red solid line shows that for a field having a finite number of extrema, the limiter permits OOA 3. This is because the limiter is triggered in only O (1) elements in this case. The green solid line is for n p = n f = 4, HR method, HANNAH ET AL.  . Convergence study of the remap algorithms. Each column corresponds to a tracer field, whose name is in the column title. The top row shows l 2 error; the bottom, l ∞ . The x axis is n e , the cubed-sphere grid element parameter. The y-axis is log 2 of the relative error. Triangles provide reference order-ofaccuracy slopes. The legend provides line decoration details for n p , n f , high-order reconstruction or panel reconstruction method, and use of a limiter. with limiter. It is jagged, but still monotonically decreasing, in the case of Gaussian hills because the limiter is triggered at just the two isolated maxima and is sensitive to the details of these maxima with respect to the grid. But the accuracy is always between the curves for n p = 5, n f = 4, HR method, with no limiter (dotted green) and n p = 4, n f = 3, HR/PR, with limiter (solid red), as we expect. For continuous but less smooth functions for which the limiters are triggered in many elements, the solid green line is straight. For the discontinuous slotted cylinders test field, OOA is sublinear in the l 2 norm, as we expect. In the l ∞ norm, error is roughly constant with refinement because of roughly constant-magnitude under-and overshoots at the discontinuity. We do not show curves for the HR method with n p = n f and no limiter; they are roughly at machine precision because of requirements R1 and R2.
For the cases in which the PR method produces different results than the HR method-lines with unfilled markers-results are again as expected. For the Gaussian hills field, n p = n f = 4, PR, with limiter (dashed green line with unfilled circles), the OOA is 3 because n f′ = 3. For the case with n p = 8 (dashed black line with unfilled triangles), the explanation above regarding jaggedness applies. We omit curves using the PR method without limiter because the OOA is limited to 3, and the dashed green line with unfilled circles already demonstrates OOA 3.  left column shows the original field. Each subsequent column shows the results of applying the algorithm described at the top of the column. For the Gaussian hills and cosine bells fields, the logarithm, base 10, of the magnitude of the difference between final GLL field and original is shown, since showing the final field itself would provide little information. There are two different color scales used for these two rows: one for the first column, and the other for the remaining columns. For the slotted cylinders field, the final field is sufficiently different that it is informative to show it directly, and the colorbar applies to all columns. Some observations of interest are as follows. For the Gaussian hills and cosine bells fields, the errors for the case n p = n f = 4, HR method, center on the manifolds where the CAAS limiter is active: the Gaussian hills maximum, a point, and the cosine bells border, a circle. The PR method has larger errors for this n p = n f = 4 configuration, as expected. But the PR method is motivated by the slotted cylinders result for n f ≥ 4. With the HR method, we see oscillations in the elements that overlap the border of the slotted cylinder in the case n f = 4. With the PR method, these are substantially reduced. The case n p = 8, n f = 6, PR method, shows that the PR method with n f′ = 3 suppresses oscillations resulting from discontinuous fields for large n f .

Model Simulations
E3SMv1 was originally forked from the NCAR CESM version 1 (Hurrell et al., 2013) and has continued to evolve with a focus on targeting next-generation exascale computers. Some notable differences between the atmosphere component in E3SM and CESM are the use of spectral element dynamics on a cubed-sphere grid (  top (approx. 60 vs. 40 km). The various physics schemes are broadly similar, but many details differ in important ways (Rasch et al., 2019;Xie et al., 2018). The land, ocean, and sea ice components also differ significantly from CESM. The simulations presented here use active atmosphere and land components with prescribed sea surface temperature (SST) and sea ice.
To analyze the impact of the physgrid on E3SM, four simulations were conducted with 30 × 30 elements per cube face (ne30). The physics grids for these experiments are ne30np4, ne30pg2, ne30pg3, and ne30pg4. We also conducted np4 and pg2 simulations that include a 4× mesh refinement over the continental US similar to Tang et al. (2019). The high-order reconstruction (HR) method for FV-to-GLL remap was used in all simulations. All simulations were run on Cori-KNL at NERSC using two OpenMP threads per MPI rank. The ne30 simulations used 85 nodes (5,400 MPI ranks) and the regionally refined cases used 150 nodes (9,600 MPI ranks). An additional experiment was run with ne45pg2, which has almost the same number of phys-HANNAH ET AL.   Figure 6, surface topography of the ne30pg2 grid (a) and the regionally refined pg2 grid (b) over the continental United States. The regionally refined grid is identical to ne30pg2 outside the refined region with a grid spacing of roughly 1.5°, and the refined region consists of grid cells spaced by roughly 3/8°. ics columns as the ne30np4 case, to explore whether the effective resolution of the model is affected by the physgrid. All simulations were run for a total of 5 years. The use of 5-years simulations is a common practice in model development and is a trade-off between computational cost and signal-to-noise ratio, but there can still be low-frequency variability that can influence the results. Additional 5-days aqua-planet experiments with file output disabled were run to help assess the performance implications of the physgrid in section 4.4.
Since our primary goal is to compare the impact of changing the physics grid we want to use the same grid for the surface component across all simulations. The land model grid is often specified to match the atmosphere grid, but for our purposes the land model is run on an equiangular 0.5° grid for all simulations. The ocean and ice components of E3SM  use an unstructured mesh based on Voronoi tessellations (Ringler et al., 2013), and all configurations share the same grid for the ocean and sea ice components even though these components only provide prescribed conditions in the simulations here.   (LWP, upper middle), and column water vapor (CWV, lower middle), and 500mb zonal wind (U500, bottom) for the ne30np4 case and physgrid cases shown as differences from the ne30np4 case. All data was regridded to a common 1° grid before averaging. Stippling indicates where the differences are statistically significant at the 95% confidence level using a Student's t-test.
HANNAH ET AL.   Figure 6 illustrates the differences between the primary grids considered in this study using topography data focused on the continental United Sates for the ne30np4, ne30pg2, ne30pg3, and ne30pg4 grids. Each point of the np4 grid represents a nodal point of the spectral element grid that does not correspond to a specific area on the sphere, thus data cannot be visualized directly like cell averaged data from an FV grid. To create a finite volume approximation of the np4 data we use an iterative process to define a quadrilateral centered around each np4 point. The area of this quadrilateral matches the GLL quadrature weight of the node so that area-weighted global averages can be calculated. All grids capture the same general topographical features, but the pg4 grid naturally captures the most detail. The regionally refined grid is highlighted in Figure 7 for the pg2 cases. The mesh refinement is only considered at the element level, so both np4 and pg2 regionally refined cases start with the same refinement pattern to define the sub-element physics grid.

Model Validation
Our primary goal is to test that the physgrid in E3SM does not degrade the simulation relative to the configuration with physics columns on the GLL nodes, and for that purpose we can focus on comparing the simulations with each other. However, it is also useful to put our results in the context of previously published model intercomparisons, so we additionally compare our simulations to E3SMv1 AMIP experiments from G19 and data from the Coupled Model Intercomparison Project Phase 5 (CMIP5; K. E. Taylor et al., 2012). To make this comparison concise we compute spatial RMS errors relative to observations for annual and seasonal averages with the PCMDI Metrics Package (Gleckler et al., 2016). The observation data for these calculations is provided by the developers of the metrics package.

Climatology
To examine the simulation results we start with a simple comparison of the 5 years mean precipitation, liquid water path, total column water vapor, and 500 mb zonal wind in Figure 8 shown as differences relative HANNAH ET AL. to the ne30np4 case. Similar analysis of the regionally refined cases is also shown in Figure 9. The data in all cases was regridded to a common 1° grid to facilitate a direct comparison. Statistical significance was determined by testing whether the differences of means were significantly different than zero with a two-tailed Student's t-test at a 95% confidence level, which is indicated with stippling in Figure 8. The degrees of freedom for significance testing at each location were determined to be five by reasoning that each year of data makes up an independent and identically distributed sample of the true annual mean climatology.
The climatological patterns of the physgrid cases are largely similar to their np4 counterparts. Some statistically significant regional differences are evident, but the spatial patterns of these differences are not consistent across physgrid cases, so we conclude that they do not indicate a fundamental change in model behavior driven by the change in the physics grid. Furthermore, small systematic differences are not completely unexpected from the change in the physics grid. Some of the small scale differences, such as the dry bias near the maritime continent in pg2 and pg3 cases, may be related to how coastlines and topography may be slightly different on the physgrid. The differences are also likely due in part to low frequency variability that still occurs in simulations with a prescribed annual cycle of SSTs. Analysis of other quantities produced similar results (not shown).
For a more comprehensive comparison of the physgrid results we recreate the root-mean-square error analysis from G19 (see their Figure 9) in Figure 10, which compares the simulations presented here with 45 CMIP5 models (boxes and whiskers) and the AMIP simulation from G19 (red markers). The RMSE values for all simulations are within or below the envelope of CMIP5 results, and the physgrid simulations produce similar results to the baseline np4 case. Smaller RMSE values in these simulations relative to the CMIP5 data are not surprising due to the lack of systematic biases in the SST. Similarly, the slight deviations between our simulations and the G19 data are also not surprising because the G19 simulation used transient SST forcing that includes inter-annual variability, whereas the simulations here used a prescribed annual cycle of SST. Overall the RMSE analysis gives further confidence that the physgrid implementation does not qualitatively alter the solution of E3SM.

GLL Grid Noise
An interesting result of Herrington, Lauritzen, Reed, et al. (2019) is that noise in the vertical pressure velocity field (i.e. omega) around mountainous terrain that is present in both np4 and pg3 configurations was eliminated on the physics grid of the the pg2 case. To investigate this in E3SM Figure 11 shows the time mean omega at 500 mb on both the physics and dynamics grids, referred to as OMEGA and DYN_OMEGA, respectively.
Figures 11a-11d shows that the noise over mountainous regions is eliminated in the physics grid data of the pg2 case, similar to the results of Herrington, Lauritzen, Reed, et al. (2019). Interestingly, the noise signal is still present on the underlying dynamics grid, although slightly reduced in magnitude for the pg2 case (Figures 11e-11h). Thus, the lack of noise on the ne30pg2 physics grid is mainly a result of smoothing by the coarseness of the physics grid relative to the dynamics grid. While it is good to HANNAH ET AL.
10.1029/2020MS002419 17 of 21 Figure 12. Power spectra of physics heating tendency calculated on a model level near 500 mb using dynamics grid data from ne30np4 (solid black) and ne30pg2, ne30pg3, and ne30pg4 (dashed) cases.

Figure 13.
Kinetic energy power spectra calculated on a model level around 200 mb using physgrid data from ne30np4, ne30pg2, and ne45pg2 cases and dynamics grid data from the ne30pg2 case. A k −3 power law is depicted by the thin black line for reference. be aware of this underlying noise when using the pg2 grid, we do not see this as a major issue since most model analysis will only consider data on the physics grid, although we recognize that it could still be having an undesirable, albeit small, influence on the model behavior.

Effective Resolution
Estimating the effective resolution of a spectral model is not a straightforward task (Laprise, 1992;Skamarock, 2004), but here we are only interested in any relative change of the effective resolution by the physgrid that might affect our interpretation of the model results. The mapping from dynamics to the pg2 grid acts as a low-pass filter on the state, but it is not immediately obvious how a coarser physics resolution will feed back onto the dynamics.
To investigate this question we first calculate spherical harmonic power spectra of physics temperature tendencies on the dynamics grid of the ne30 simulations using a model level near 500 mb ( Figure 12). Using the physics tendencies on the dynamics grid provides a direct comparison to the baseline ne30np4 simulation data because the same map from model grid to the grid used to compute spherical harmonics is used for all cases. The spectra were calculated from daily snapshots after remapping to a 721 by 1,440 Gaussian grid to facilitate a spherical harmonic transform using a high order, conservative method (Ullrich & Taylor, 2015). The spectra of physics tendencies in Figure 12 indicate a lower effective resolution with pg2 as expected (dashed red line), which shows a slightly faster roll-off of power with increasing wave number. The pg3 and pg4 cases (blue and green dashed line) more closely resemble the np4 baseline (black line).
While the spectra of physics tendencies directly show the effect of the mapping, we can also use the power spectra of kinetic energy to examine how the change in physics tendencies affects the behavior of the dynamics and how this is ultimately represented on the physics grid that is typically used for analysis. Figure 13 shows spherical harmonic power spectra of kinetic energy using physics grid data for the ne30np4, ne30pg2, and ne45pg2 cases and the dynamics grid data for the ne30pg2 case. The data were similarly remapped to a 721 by 1440 Gaussian grid using a high order, conservative method for GLL data and a bilinear method for FV physgrid data.
The effective resolution can be estimated from the roll-off of the kinetic energy spectra at higher wave numbers. The ne30pg2 case begins to diverge from the ne30np4 case around wave number 30-40, suggesting a slightly lower effective resolution that should be expected from the lower degrees of freedom on the physgrid. However, the dashed red curve shows the spectra of dynamics grid data from the ne30pg2 case and lies directly on top of the ne30np4 spectra. So we conclude that the use of the physgrid does not degrade the effective resolution of the dynamics even though the representation of the model state on the physgrid is smoother.
The ne45pg2 case provides an insightful comparison because it has almost the same number of physics columns as the ne30np4 case (48,600 vs. 48,602). We might naively expect a similar effective resolution between the two, but the spectra in Figure 13 show that this is not the case. The ne45pg2 data show that the effective resolution is still higher than ne30np4 due to the higher resolution of the underlying ne45np4 dynamics grid.
HANNAH ET AL.

Model Performance
To summarize the performance impact of the physgrid Figure 14a shows individual estimates of overall model throughput (small gray dots) and the average throughput over the whole simulation (large black dots) in simulated years per wall-clock day (sypd). The overall throughput estimates include the effects of interprocess communication and file output. If we only consider timing estimates of the dynamics calculations and convert these data into throughput values the results are roughly the same across cases that share the same underlying dynamics grid (Figure 14b). For the same throughput conversion of physics timing estimates we see much more variation between cases due to the changing number of physics columns (Figure 14c). The pg2 case has significantly higher throughput due to having 4/9 as many physics columns, whereas the pg4 case has lower throughput due to having 16/9 as many physics columns. The pg3 case has almost the same number of physics columns as np4, which explains why the throughput is similar.
The spread of the individual dynamics throughput estimates is primarily caused by the interprocess communication required for the direct stiffness summation (DSS) that communicates information between elements. The individual physics throughput estimates are much more consistent due to the lack of interprocess communication. In spite of this fact, the physics calculations still do not line up with the idealized scaling we might expect from simply changing the number of columns (not shown). Figure 15 shows the throughput of 5-days aquaplanet simulations with file output disabled using 85, 43, and 22 Cori-KNL nodes. Colored lines show the ideal throughput based on scaling the results of the ne30np4 grid by the change in the physics columns for each physgrid case. Figure 15 illustrates that the departure from the ideal scaling is a result of using more nodes for higher throughput. A smaller number of nodes shows much better agreement with the ideal scaling, which highlights a loss of efficiency when running in a higher throughput regime. This is consistent with the results of (Bertagna et al., 2019) who showed that Cori-KNL efficiency can vary with the amount of workload per node.

Conclusions
This study details the implementation and impact of a quasi-equal-area finite volume physics grid for use in E3SM alongside the spectral element dynamics grid. The mapping between the dynamics and physics grids is constrained to be local to each spectral element to minimize inter-process communication and permit regional mesh refinement. The motivation to provide an alternative to the more common co-located grids approach was both to avoid the effects of grid imprinting and to improve performance by reducing the number of physics columns for a given dynamics grid.
While the new physics grid can be specified to have a finer average spacing than the dynamics grid with 4 × 4 finite volume cells per element (pg4), the model solution is shown to be qualitatively insensitive to this change. This corroborates the results of Herrington, Lauritzen, Reed, et al. (2019) who found that the resolved scale of the model is primarily determined by the effective resolution of the dynamics. Thus, a physics grid with 2 × 2 finite volume cells per element (pg2), which is coarser than the underlying dynamics grid, is preferable to optimize performance. Simulations show that the physics throughput can be increased by approximately 60%-120%, depending on configuration of computational resources, using the pg2 grid over the co-located np4 grid.
HANNAH ET AL. Analysis of the simulated climate does not indicate any large systematic change in the simulated climate due to the new physics grid, although noise over steep topography is notably reduced with pg2. In general, grid noise and imprinting is not easily detectable away from steep topography when using traditional physics parameterizations, but the noise becomes significantly amplified when using the Multi-scale Modeling Framework (MMF) configuration of E3SM (E3SM-MMF; Hannah et al., 2020), in which a cloud resolving model is embedded in each physics column to explicitly represent convective scale processes. The use of the pg2 grid with E3SM-MMF helps remedy the grid imprinting problem in addition to taming the substantial increase in physics cost associated with the MMF.