GFDL SHiELD: A Unified System for Weather‐to‐Seasonal Prediction

We present the System for High‐resolution prediction on Earth‐to‐Local Domains (SHiELD), an atmosphere model developed by the Geophysical Fluid Dynamics Laboratory (GFDL) coupling the nonhydrostatic FV3 Dynamical Core to a physics suite originally taken from the Global Forecast System. SHiELD is designed to demonstrate new capabilities within its components, explore new model applications, and to answer scientific questions through these new functionalities. A variety of configurations are presented, including short‐to‐medium‐range and subseasonal‐to‐seasonal prediction, global‐to‐regional convective‐scale hurricane and contiguous U.S. precipitation forecasts, and global cloud‐resolving modeling. Advances within SHiELD can be seamlessly transitioned into other Unified Forecast System or FV3‐based models, including operational implementations of the Unified Forecast System. Continued development of SHiELD has shown improvement upon existing models. The flagship 13‐km SHiELD demonstrates steadily improved large‐scale prediction skill and precipitation prediction skill. SHiELD and the coarser‐resolution S‐SHiELD demonstrate a superior diurnal cycle compared to existing climate models; the latter also demonstrates 28 days of useful prediction skill for the Madden‐Julian Oscillation. The global‐to‐regional nested configurations T‐SHiELD (tropical Atlantic) and C‐SHiELD (contiguous United States) show significant improvement in hurricane structure from a new tracer advection scheme and promise for medium‐range prediction of convective storms.


Unified Modeling at GFDL
As computing power increases, global atmosphere models are now capable of regular simulation at resolutions that had been the sole domain of regional atmospheric models. The Integrated Forecast System (ECMWF, 2019a(ECMWF, , 2019b of the European Center for Medium-Range Weather Forecasting runs on a 9-km grid, and the Global Forecast System (GFS; Sela, 2010) of the U.S. National Centers for Environmental Prediction (NCEP) runs on a 13-km grid. Some CMIP-class (Coupled Model Intercomparison Project) climate models now use grids with spacings as fine as 25 km (Chen & Lin, 2013;Haarsma et al., 2017;Vecchi et al., 2019). Global atmosphere models lack the lateral boundary errors that contaminate the solutions of regional models after a few days of simulation. They thus allow us to extend mesoscale and storm-scale predictions into the medium range and beyond (Harris & Lin, 2013;Harris et al., , 2019Zhou et al., 2019). Global modeling also brings many new challenges-one cannot "throw your garbage in the neighbor's yard" in global modeling, so to speak. Biases and radiative imbalances must be minimized, as must errors anywhere in the atmosphere that could potentially grow and contaminate the entire domain. A unified modeling system supports a variety of applications at a wide range of spatial and temporal scales within a single framework. These systems promise to simplify operational and research modeling suites and better exchange improvements and bug fixes between applications. The Unified Model of the U.K. Met Office (Brown et al., 2012) is the most notable unified system. Variable-resolution models McGregor, 2015) are particularly well suited for unified modeling as they can efficiently reach very high resolutions over part of the earth, replacing the highest-resolution regional models (Hazelton, Bender, et al., 2018;Zhou et al., 2019) and potentially extending their lead times.
Here at the Geophysical Fluid Dynamics Laboratory (GFDL), a hierarchy of models has been developed for a variety of time and space scales, from centennial-scale earth-system simulations  to very high-resolution weather prediction. The GFDL suite is unified around a single dynamical core, the GFDL Finite-Volume Cubed-Sphere (FV3) Dynamical Core (Putman & Lin, 2007), and a single framework, the Flexible Modeling System (Balaji, 2012), and other shared components. We describe one part of this suite, the System for High-resolution prediction on Earth-to-Local Domains (SHiELD). This model, previously called fvGFS, was developed as a prototype of the Next-Generation Global Prediction System of the National Weather Service and of the broader Unified Forecast System (UFS). SHiELD continues GFDL's high-resolution global modeling program previously established using the High-Resolution Atmosphere Model (HiRAM; Chen & Lin, 2013;Zhao et al., 2009). SHiELD couples the nonhydrostatic FV3 dynamical core  to a physics suite originally from the GFS (Han et al., 2017, and references therein) and the Noah Land Surface Model (Ek et al., 2003). SHiELD can be used for a variety of time scales but has been designed with a particular focus on short-to-medium range weather (18 hr to 10 days) and into the subseasonal-to-seasonal (S2S; several weeks to several months) range. Seasonal-to-decadal predictions and centennial-scale climate projections coupled to a dynamical ocean are performed at GFDL using the Seamless System for Prediction and Earth System Research (Delworth et al., 2020), the Coupled Model Version 4 (CM4; Held et al., 2019), and the Earth System Model Version 4 .
Since FV3 is designed to adapt to a variety of purposes and to any scale of atmospheric motion, it is an ideal platform for a unified modeling system. All of the SHiELD configurations described here, as well as regional and doubly periodic applications lying beyond the scope of this paper, use the same code base, the same executable, the same preprocessor, the same runscripts, and same postprocessing tools, demonstrating a true unification for modeling on weather-to-S2S time scales. This approach also suggests how further unification with GFDL's climate models may proceed, which use a different atmospheric physics (Zhao et al., 2018), the MOM6 Dynamical Ocean (Modular Ocean Model, version 6; Adcroft et al., 2019), and the GFDL Land Model, version 4 (LM4) land model. Advances in SHiELD can be seamlessly moved into other UFS models, including the 2019 upgraded GFSv15, and other FV3-based models. Most notably, advances in SHiELD can migrate into UFS models slated for operational implementation at NCEP, including the FV3-based GFSv15. NASA GEOS (National Aeronautics and Space Administration Goddard Earth Observing System; Putman & Suárez, 2011), NASA/Harvard GEOS-Chem High-Performance, The Community Earth System Model version using the FV3 dynamical core, and the Chinese Academy of Sciences' FGOALS (Flexible Global Ocean-Atmosphere-Land System Model; Guo et al., 2020) all also use FV3 as their dynamical core and can benefit from the advances described below. This diversity of FV3-based models shows the advantages of using common components to leverage advances in the dynamical core but while still allowing centers to tailor their models to their own needs, the freedom to innovate new model designs, and to encourage the development of models as holistic-integrated systems, rather than clumsily joining independent components. SHiELD is designed for exploratory research into model design and development, with a focus on dynamics and physics-dynamics integration, and for research on prediction and atmospheric processes on time scales from a few hours to a few months. SHiELD is currently focused on deterministic prediction although effective S2S prediction will require the development of a simple ensemble (cf. Chen & Lin, 2013). In this manuscript we use forecast skill as a principal means of establishing the scientific credibility of SHiELD as a research tool. Further research will more closely evaluate specific structures and processes within SHiELD, with some initial results described below (especially section 3.2) and in prior research (cf. Hazelton, Bender, et al., 2018).
The design, evolution, configurations, and simulation characteristics of SHiELD are the subject of this paper. Section 2 describes the components of SHiELD and how they work together as a complete modeling system. Section 3 describes the four configurations of SHiELD for a variety of applications, including medium-range weather, continental convection, tropical meteorology and hurricanes, and S2S prediction. Section 4 summarizes the history of SHiELD development and discusses prospects for future work.

Nonhydrostatic FV3 Dynamical Core
All SHiELD simulations use the nonhydrostatic solver within the FV3 Dynamical Core. This core has been described in detail in other papers (Harris & Lin, 2013;Lin, 2004;Putman & Lin, 2007, and references therein) and will only be summarized here. FV3 solves the fully compressible Euler equations on the gnomonic cubed-sphere grid and a Lagrangian vertical coordinate. Fast vertically propagating sound and gravity waves are solved by the semi-implicit method; otherwise, the algorithm is fully explicit. FV3 advances sound and gravity wave processes and advects thermodynamic variables on the shortest "acoustic" timestep, while subcycled tracer advection and vertical remapping (cf. Lin, 2004) are performed on an intermediate "remapping" timestep, in turn performed multiple times per physics timestep.
FV3's discretization along Lagrangian surfaces uses the piecewise-parabolic method, which previously used a monotonicity constraint to ensure positivity and to dissipate energy cascading to grid scale. In nonhydrostatic FV3, dynamical quantities (vorticity, potential temperature, and air mass) are advected by a nonmonotonic scheme to reduce dissipation of resolved-scale modes. Previous work with nonhydrostatic FV3 had continued to use a monotonic advection scheme to avoid unphysical negative values. In this manuscript we present results using a new positive-definite (PD) but nonmonotonic scheme to advect tracers, which greatly improves the representation of marginally resolved and discontinuous features without creating computational noise at sharp gradients. This scheme is described in detail in Appendix A and applications to the representation of tropical cyclones in section 3.2.

GFS/SHiELD Physics and Noah LSM
SHiELD inherits the GFS suite of physical parameterizations developed by the Environmental Modeling Center of NCEP (2020). The initial 2016 version of SHiELD, implemented for dynamical core testing during Phase II of Next-Generation Global Prediction System, used physics largely identical to the then-operational GFSv13: the Simplified Arakawa-Schubert (SAS) shallow and deep convection schemes described in Han and Pan (2011); the hybrid eddy-diffusivity mass-flux (EDMF) scheme (Han et al., 2016); the Rapid Radiative Transfer Model (Clough et al., 2005); the microphysics of Zhao and Carr (1997) and cloud-fraction scheme of Xu and Randall (1996); the Navy's simplified ozone scheme (McCormack et al., 2006); the GFS orographic gravity wave drag and mountain blocking schemes (Alpert, 2004); and the convective gravity wave drag scheme of Chun and Baik (1998).
We have since made many changes to the physics to be able to support new applications, especially for convective-scale prediction and marine phenomena, or to take advantage of new capabilities within the FV3 dynamical core. We first introduced the six-category GFDL microphysics and cloud fraction scheme  with the fast microphysical processes split out of the physics driver and taking place on the shorter remapping timestep. Later, the GFDL microphysics was fully in-lined within FV3 (Appendix B). Several new planetary boundary layer (PBL) schemes have also been used in SHiELD, including a modified hybrid eddy-diffusivity mass-flux (EDMF) PBL as per Zhang et al. (2015), and the Yonsei The land surface model is the Noah Land Surface Model (Ek et al., 2003), integrated within the physics and paired to the GFS surface-layer scheme. In 2017, Noah was upgraded to use the high-resolution land surface data (Wei et al., 2017), which greatly improves the appearance of land-surface fields in convective-scale simulations.

MLO
Initially, sea surface temperatures (SSTs) were prescribed as the climatological SST plus an SST anomaly from initial conditions which gradually decays to zero, without influence from the atmosphere. However, air-sea interactions are critical for several phenomena of interest to us, especially tropical cyclones and the Madden-Julian Oscillation (MJO), and may impact large-scale skill as well. To incorporate atmosphere-ocean interaction, we have implemented a modification of the mixed layer ocean (MLO) of Pollard et al. (1973). This simple ocean computes the mixed layer depth and heat within that mixed layer as prognostic variables, with tendencies computed from the net surface heat flux. The SST is nudged toward the NCEP Real-Time Global SST (Thiébaux et al., 2003) climatology plus a fixed initial anomaly which decays with a fixed time scale. The ocean mixed layer depth is also nudged toward observed climatology (de Boyer Montégut et al., 2004). While considerably simpler than the three-dimensional dynamical oceans in CM4 (Held et al. 2019) and in the GFDL Hurricane Model , the MLO still represents the thermodynamic and dynamic ocean interactions of greatest significance on the time scales for which SHiELD is used , without incurring the complexity of a three-dimensional dynamical ocean.

Interoperability With Other UFS Models
SHiELD was designed to work with other models that use FV3, Flexible Modeling System, the GFS Physics Driver, and/or the Interoperable Physics Driver (IPD). The IPD is the interface between FV3 and the GFS Physics Driver, although it can support other physics suites. Innovations within SHiELD can then be seamlessly exchanged with other models using these same components. The UFS Atmosphere led by NCEP (https://github.com/NOAA-EMC/fv3atm/) is analogous to SHiELD. For example, the transition of FV3 and the GFDL Microphysics into the operational GFSv15 was accelerated by the IPD. Conversely, schemes which have been introduced into the GFS Physics Driver by the broader community can then be integrated into SHiELD, including the numerous schemes implemented by Zhang et al. (2019).

SHiELD Configurations
SHiELD leverages the flexibility of FV3 to be able to make accurate and efficient simulations at a variety of spatial and temporal scales. Much of the development of SHiELD (and previously, of HiRAM) has been driven by a desire to improve the simulation quality at the convection-permitting resolutions covered by the range of SHiELD configurations.
We present four different configurations of SHiELD. All configurations are global domains using either a uniform grid or a locally refined grid using nesting or stretching (Harris & Lin, 2013;Harris et al., 2016;Zhou et al., 2019). SHiELD can also run on FV3's doubly periodic domain (Arnold & Putman, 2018;Held et al., 2007) or on a regional domain using any regular quadrilateral grid (Dong et al., 2020), at spatial resolutions down to a few tens of meters (Jeevanjee, 2017). These applications lie beyond the scope of this paper.
The four configurations can be fit within two "tiers"; Tier-1 configurations are the most well tested, having originally been developed as prototypes to replace legacy NCEP models by FV3-based UFS systems and having been run in near-real time for several years. These configurations demonstrate the capabilities of SHiELD, allow direct comparison to existing operational models, and provide robust tests of the forecast skill and reliability of SHiELD. Current real-time configurations are run twice daily and displayed online (https:// shield.gfdl.noaa.gov/).
The Tier-1 configurations are our flagship 13-km SHiELD, a prototype for the now-operational GFSv15 and for future upgrades of the GFS; (Tropical) T-SHiELD with a static, 3-km nest spanning the tropical North Atlantic, a prototype of the Hurricane Analysis and Forecast System (HAFS); and (Continental) C-SHiELD with a 3-km nest over the contiguous United States (CONUS), a prototype of the Regional Forecast System (RFS). Each of the Tier-1 configurations are usually refreshed every year with a new version, indicated by the year of the upgrade.
Our Tier-2 configurations address new challenges for numerical prediction and are still under development. Our 25-km (Subseasonal) S-SHiELD addresses the challenging domain of S2S prediction. Another configuration not discussed in this paper is the SHiELD global cloud-resolving model and addresses the frontier computational and data challenges of such simulations. This configuration was submitted to the DYnamics of the Atmospheric general circulation Modeled On Nonhydrostatic Domains intercomparison (Satoh et al., 2019;Stevens et al., 2019). Both configurations inspire the development of new functionality and capabilities within SHiELD and readily expose instabilities, climate drift, conservation issues, and other shortcomings. The advances driven by work on these frontier challenges help improve the Tier-1 configurations, demonstrating the value of a seamless prediction system. The domains for each of the four configurations plus the global cloud-resolving model configuration are depicted schematically in Figure 1.
Although all configurations follow the unified "one code, one executable, one workflow" structure of SHiELD, the configurations are not identical owing to the need to tailor each configuration for its specific application. Further, given the rapid pace of SHiELD development and the staggered development cycle for some of the configurations, we do not expect all of the Tier-1 configurations to always have the very latest developments. The development paths of the different SHiELD configurations can be seen in Table 1.
All configurations are initialized using the real-time GFS analyses made available by NCEP following Chen et al. (2018). This "cold starting" from the hydrostatic, spectral GFS could potentially leave the convective-scale configurations (T-SHiELD and C-SHiELD) at a comparative disadvantage to models with native, specialized convective-scale data assimilation. This issue is minimized here due to the ability of FV3-based models to "spin up" their convective scales within a few hours of initialization and experience little degradation thereafter Hazelton, Bender, et al., 2018;Marchok et al., 2018;Zhang et al., 2019).
Computational efficiency is crucial for useful simulation modeling, for both real-time and experimental applications. We present the timings for the most recent iterations of SHiELD in Table 2. The 13-km SHiELD needs only 3,096 processor cores to complete 1 day in under 8.5 min, the threshold traditionally used for operational global prediction. The 25-km S-SHiELD completes 1.5 years per day with just over 1,700 cores; we are hoping to improve the computational cost as part of further S-SHiELD development. C-SHiELD is necessarily more expensive owing to its nested grid but still completes a 5-day simulation in

Development of the Four SHiELD Configurations and Their Yearly Revisions Described in This Paper
Configuration

Journal of Advances in Modeling Earth Systems
under 2 hr on less than 3,500 cores. T-SHiELD has a nested grid with twice as many columns as C-SHiELD but is only about 30% more expensive.
SHiELD is compiled with mixed-precision arithmetic: The dynamics (and the inlined components of the microphysics) use single-precision arithmetic, while the physics uses double precision. This differs from the practice used for most operational models (GFSv15 excluded) and for GFDL climate models, which use double-precision arithmetic throughout. Tests with the 2016 version of SHiELD had found no detectable difference in skill between predictions using mixed-precision and double-precision arithmetic while leading to a cost reduction of about 40%.

SHiELD Medium-Range Weather Prediction
The flagship SHiELD configuration is designed for medium-range prediction with lead times of 24 hr to 10 days.   Note. Average performance statistics over a number of simulations are given. Time per day includes the initialization, termination, and I/O. All simulations were done on the C4 partition of Gaea, a Cray XC40 supercomputer with Intel Broadwell processors. All SHiELD simulations use FV3's nonhydrostatic solver.
The simulation characteristics and prediction skill of SHiELD have been previously discussed in several papers and will not be repeated here. Improving predictions of tropical cyclone track, intensity, and genesis has been a key driver of SHiELD development: Chen, Lin, Magnusson, et al. (2019)  The anomaly correlation coefficient (ACC) of the 500-mb geopotential height field is the standard means for evaluating the large-scale prediction skill of medium-range prediction models. Figure 3 (top) shows that the global ACC of SHiELD has been better at all lead times than the contemporary GFS since the 2017 version and significantly so on Days 1-6. At all lead times except for Days 7 and 8, each new version has improved upon the previous version. The result for root-mean square error (RMSE; Figure 3, bottom) is even more striking: Every version is an improvement upon the previous at every lead time, and both the 2018 and 2019 versions are significantly better than the operational GFS. Results for just the Northern Hemisphere (20-80°N; supporting information Figure S1) are less dramatic, but SHiELD still shows statistically significant improvements in ACC and RMSE out to Day 5. Both the GFS and all versions of SHiELD reach an ACC of 0.6 at 8.3-8.5 days globally and 8.5-8.7 days in the Northern Hemisphere, with some year-to-year and version-to-version variabilities.
The time series of Day 5 global ACC and RMSE ( Figure 4) shows that while there is a general secular improvement in both SHiELD and the GFS, there can be large seasonal and even interannual variability in forecast skill. Usually, predictions are more skillful in northern winter, as strong synoptic forcing dominates the large-scale weather patterns, but some northern summers see little to no forecast degradation. The implementation of GFSv13 on 11 May 2016, which included a major upgrade to the data assimilation Gray shading is the 95% confidence interval. Each version of SHiELD is evaluated with 2 years of 10-day hindcasts initialized at 00Z every 5 days, for a total of 144 cases per version. See Table 1 for the time periods being compared here.

10.1029/2020MS002223
Journal of Advances in Modeling Earth Systems cycling system of the GFS, significantly reduced RMSE in May and June 2016 compared to the preceding four months of the year. These results are worthy of further investigation. We do conclude that it may be misleading to use a short time period to evaluate or compare global prediction models.
The time evolution of the large-scale forecast skill for both the GFS and SHiELD are very similar on monthly and shorter time periods, which is expected as they use identical initial conditions, and SHiELD benefits from continual upgrades of the GFS initial conditions. As discussed in , the quality of the initial conditions is the preeminent factor in determining the forecast skill for the large-scale circulation as well as for metrics such as hurricane track forecasts that depend closely on the prediction skill of the large-scale flow.
These results are for hindcasts, but the ACC and RMSE for our real-time forecasts are nearly identical. An important caveat is that the operational GFS supports nearly the entire NCEP modeling suite, and so the GFS has many more demands and a much more stringent evaluation process imposed upon its development than does SHiELD. The development cycle of the GFS will therefore necessarily be less rapid and more methodological than that of SHiELD. Alternately, an experimental research model like SHiELD does have the freedom to pursue many different avenues for model development ("failure is always an option") so that the most successful new ideas can later be transitioned into operations, a major goal of the UFS.  Another sensible weather metric is the 2-m temperature, which has an interesting development history (Figure 8). The initial 2016 version of SHiELD had a very small warm bias, significantly less than the small (0.3 K) warm bias of the operational GFS. The 2018 version of SHiELD, which otherwise had significant improvements in other skill metrics, developed a cool bias which increased to 0.6 K by Day 10. Investigation traced the cool bias to two sources: the switch from the hybrid EDMF PBL to YSU, which by default has significantly less near-surface mixing and thereby allows the surface to cool too much, and the change in how cloud droplets absorb radiation when the Inline GFDL Microphysics was introduced. In 2019, the cloud-radiation interactions were significantly revised, and the background diffusion in the YSU PBL was increased, which significantly reduced both the cold bias and the error in 2-m temperature. The cold bias in SHiELD 2019 ranges from 0.1 K on the first day to 0.35 K on Day 10, which is approximately equal to the positive bias of the operational GFS.

T-SHiELD North Atlantic Nest for Tropical Cyclone Prediction
T-SHiELD uses the variable-resolution capabilities of FV3 to replicate the tropical cyclone track skill of global models and the intensity skill of convective-scale regional hurricane models. This configuration uses the 13-km SHiELD grid and then places a large factor of 4 two-way nest over the tropical North Atlantic (Figure 1). The resulting nested domain has grid cells of about 3-km width and interacts with its parent global domain. Earlier experiments and a comprehensive evaluation of T-SHiELD 2017 were described in   Hazelton, Bender, et al. (2018) and . T-SHiELD has been used as the initial prototype for the HAFS . Here we will describe further evolution of T-SHiELD, including progress toward rectifying two forecast issues in T-SHiELD 2017: an underintensification bias for rapidly intensifying storms and storms with a radius of maximum winds (RMW) that is too large. Note that there is no 2019 version of T-SHiELD.  found that the RMW in T-SHiELD 2017 was often larger than observed and in particular larger than that in HWRF simulations from the same set of cases. Zhang et al. (2015) found that reducing the parameterized mixing in the PBL scheme reduced the size of the RMW in HWRF. While reducing the parameterized mixing in the hybrid EDMF scheme gave modest improvement to hurricane structure in T-SHiELD, there was no appreciable reduction in the size of the eyewall. A dramatic and immediate impact was instead found by using the PD advection scheme for water vapor and microphysical tracers. Results from T-SHiELD 2018 simulations of Major Hurricane Irma, initialized prior to its rapid intensification, show that a simulation using the older monotonic advection scheme ( Figure 9) produces a gradually expanding vortex that does not intensify. Meanwhile, the simulation with the new PD scheme and no other changes to the physics or dynamics, including advection of dynamical quantities, produces an intensifying storm with a contracting eyewall. Notably, the vertical velocity within the eyewall is much more coherent with the PD scheme and is continually displaced within the eyewall, which we suspect may be driving both the intensification of Irma and a continued contraction of the eye, as well as contributing to enhanced precipitation within the eyewall. For this reason, the PD advection scheme was selected for T-SHiELD 2018.

Journal of Advances in Modeling Earth Systems
A more systematic comparison of wind radii between the 2017 and 2018 T-SHiELD versions (Figure 10d) shows that the effect of the PD scheme is not limited to a single storm. Noting that the difference between the two T-SHiELD versions is more than just the PD scheme, we do see a systematic and substantial decrease in the radius of the 64-kt (33 m s −1 , hurricane force) winds in the 2018 version. The 2018 version spins up the vortex such that within 36 hr of initialization, the 64-kt radii reduce to and then remain at a consistent 20-25 nautical miles (nm; 37-46 km) for the rest of the forecast period. This represents a reduction of more than half at 120-hr lead time compared to the 2017 version, which steadily widens the 64-kt radii during the simulation. There is also a reduction in radii forecast errors compared to Best Track estimates in T-SHiELD 2018, with the qualification that there is considerable (potentially 40% for 64 kt; Landsea & Franklin, 2013) uncertainty in estimates of wind radii. This uncertainty can impact the initialization of tropical cyclones using real-time storm message files (Bender et al., 2017) and thereby of estimates of size-related impacts like precipitation and extreme winds.
The multiple changes in the 2018 version of T-SHiELD combined to create tropical cyclones which are stronger overall (Figures 10a and 10b), with little to no bias toward more intense storms at all lead times. There is a minor degradation in track error in the 2018 version at longer lead times (Figure 10c). The adoption of the

Journal of Advances in Modeling Earth Systems
PD scheme and YSU PBL scheme likely created forecasts of more intense storms mitigated by the introduction of the interactive MLO. While the weak bias of the 2017 version was alleviated, intensity predictions were not appreciably improved except at 120-hr lead time and, in fact, were degraded between 36 and 72 hr after initialization. These results show once again the great challenge of improving intensity prediction. The reduction in RMWs in simulations using the PD scheme will be discussed in more detail in a forthcoming manuscript.

C-SHiELD Nest for Continental U.S. Convection
C-SHiELD was designed to efficiently reach convective-scale resolutions in a global domain, in this case to replicate the capability of regional convective-scale models for continental convection such as the 3-km North American Model (NAM) Nest and the members of the High-Resolution Ensemble Forecast. C-SHiELD also is designed to extend convective-scale forecasts beyond the 18-to 60-hr ranges of existing U.S. operational CONUS models into the medium-range time scales and beyond. The nested domain of C-SHiELD serves as a prototype for the RFS  and the Rapid-Refresh Forecast System (Alexander et al., 2020), both using the regional domain capability being developed within FV3.
The 2017 version of C-SHiELD is described in Harris et al. (2019). Modified versions of C-SHiELD with different microphysics and PBL schemes are described in Zhang et al. (2019) and Snook et al. (2019). C-SHiELD 2018 saw considerable updates as shown in Table 2; C-SHiELD 2019 added incremental updates, including reconfiguration of the numerical diffusion and GFDL microphysics. We will limit our discussion to the evolution of broad forecast characteristics, but we will perform year-round validation instead of restricting the analysis to a single season. The time periods evaluated are given in Table 1. The exception is for the Surrogate Severe verification below, which is only verified for peak severe weather season of April to August of each year.
Precipitation forecast skill (Figure 11, top panels) is similar among all three versions of C-SHiELD. The 2019 version has the least overall bias (Figure 11, bottom panels) as earlier versions had too much light and too little heavy precipitation. The 2019 version reduced the diurnal cycle in the bias of light and moderate precipitation, although this was still apparent in the bias score for heavy precipitation and still had a prominent high bias of heavy precipitation during the first 30 hr. We speculate that the reconfiguration of the numerical diffusion, which improved storm placement, and the revised settings for the GFDL microphysics, which improved structure and evolution of the storms, combined to improve the biases in the 2019 version.
We use the surrogate severe technique of Sobash et al. (2011) to validate our 2-to 5-km updraft helicity (UH) fields against storm reports from the Storm Prediction Center. This is a well-established method used for evaluation of convective-scale prediction models (cf. https://hwt.nssl.noaa.gov/sfe/2018/docs/HWT_SFE_ 2018_Prelim_Findings_v1.pdf). We create surrogate severe fields and validate against observed severe fields to compute FSS and Bias scores in C-SHiELD and plot the results as a function of UH threshold and smoothing radius (Figure 12), similar to Figure 17 in Sobash et al. (2016). For all versions of C-SHiELD, the highest FSS is found from the largest smoothing radius of 240 km and for UH thresholds of 150-200 m 2 s −2 , with slightly higher or lower thresholds giving similar skill scores. The UH threshold giving the best score for C-SHiELD is higher than in many other convective-scale models due to the significantly higher updraft helicities in FV3-based models (Potvin et al., 2019). This in turn is likely due to the emphasis on vorticity in the horizontal discretization as described in Harris et al. (2019).  has a high-frequency bias except for the very highest UH thresholds, as it is still too aggressive at creating strong storms.
We also investigate if skillful prediction of severe weather is possible beyond the first forecast day. Figure 13 shows surrogate severe FSS for Days 1 through 4 (Hours 12-36, 36-60, 60-84, and 84-108, respectively). The FSS value is not as high on later days as on the first, but even on Day 4, the FSS is still a respectable 0.74, indicating that there is skill in predicting severe weather multiple days in advance. These high skill scores may be partially due to the relatively large smoothing radius of 240 km.
These multiple-day severe weather forecasts are in the spirit of the convective outlooks issued by the Storm Prediction Center (www.spc.noaa.gov/products/outlook; Edwards, 2015) based on predictions of synoptic-scale environments favorable for severe weather. The advantage of using a dynamical convective-scale prediction model on medium-range time scales is that explicit prediction of storms, instead of just environments, potentially can give forecasts of convective modes and specific hazards.

S-SHiELD S2S Prediction
We briefly describe the characteristics of the Tier-2 S-SHiELD configuration, using a 25-km grid designed for climate integrations and for subseasonal and seasonal predictions. S-SHiELD is configured similarly to the 13-km SHiELD, although SHiELD's 2-day relaxation time scale of SSTs in the MLO toward the "frozen anomalies" is extended to 15 days in S-SHiELD. Unlike the vast majority of climate models, S-SHiELD is nonhydrostatic and uses a more sophisticated microphysics which is updated much more frequently. While these features do make S-SHiELD more expensive than analogous 25-km hydrostatic climate models (cf. Murakami et al., 2016;Roberts et al., 2018), previous experience with HiRAM (Chen & Lin, 2011Gao et al., 2018) has shown that nonhydrostatic dynamics and better microphysical-dynamical coupling yields a better representation of mesoscale convective systems and in particular of tropical cyclones, a major emphasis of our group's research.

Journal of Advances in Modeling Earth Systems
The MJO plays a major role in subseasonal variability but has been a challenge for many models to predict or even simulate reasonably (Kim et al., 2018). To explore the MJO prediction skill of S-SHiELD, we performed ninety-two 40-day predictions, one initialized at 00Z every 2 days from 1 October 2011 to 31 March 2012, covering the active Dynamics of the MJO (DYNAMO; Yoneyama et al., 2013) observation period. The Real-time Multivariate MJO Index (RMM; Wheeler & Hendon, 2004) is calculated for the hindcasts following the methodology of Xiang et al. (2015) and Vitart et al. (2017). For each hindcast, we compute daily-mean anomalies of outgoing longwave radiation and zonal wind at 200 mb (U200) and 850 mb (U850), averaged between 15°S and 15°N. These forecast anomalies are not bias corrected since we use observed climatology as reference instead of model climatology. We then subtract the averaged anomalies of the previous 120 days from the total anomalies to remove the signals of interannual and longer time scale variability; observed anomalies are appended to the anomalies in the hindcast. The normalized U200, U850, and outgoing longwave radiation anomalies are then projected onto the precomputed Empirical Orthogonal Functions from Wheeler and Hendon (2004) to obtain the two RMM indices.
We find that S-SHiELD with the MLO (Figure 14) has good skill (correlation > 0.7) out to 19 days and useful skill (correlation > 0.5) out to 28 days. The RMSE likewise shows similar skill (RMSE < ffiffi ffi 2 p out to 27 days). This skill may not be representative of other time periods given that skill is known to be higher during strong events (cf. Xiang et al., 2015) and the period of evaluation is relatively short. However, this result does give us confidence that S-SHiELD simulates the MJO well enough for useful S2S prediction. We plan to expand our evaluation of the MJO in forthcoming work.
The behavior of the MJO in GFDL's CMIP6-generation climate models (Zhao et al., 2018) suggests that the two keys for a good MJO simulation are an appropriate convection scheme and some form of interactive ocean, a result found also by DeMott et al. (2019) and others. A second set of S-SHiELD experiments was performed using specified climatological SSTs plus frozen anomalies. These simulations without the interactive MLO had much smaller RMM correlations, with predictions no longer useful after Day 20, and larger errors. The effect of the interactive ocean is made clear in Figure 15, in which S-SHiELD with the MLO correctly predicted the formation of all three strong MJO events during this period 10-15 days in advance and correctly propagated all events through the Maritime Continent (near 120°E longitude), although the propagation speed is slower than observed and there is some disruptions near the Maritime Continent. However, S-SHiELD with prescribed SSTs has difficulty propagating the MJO through the Maritime Continent and, for the November event, creates no MJO whatsoever. The November event proves particularly challenging for S-SHiELD without the MLO as it performs poorly at a range of lead times ( Figure S2) but poses no problem for S-SHiELD with the MLO. It is clear that the simple, inexpensive MLO used in S-SHiELD is sufficient to significantly extend the predictability of the MJO.
DeMott et al. (2019) did not describe any deficiencies of the MJO from models using a 1-D column ocean instead of a 3-D dynamical ocean, which suggests a limited role for direct feedbacks between ocean circulations and the MJO. However, they did not rule out indirect effects of the MJO on ocean circulation that could impact other S2S-time scale phenomena or MJO teleconnections. Other investigators have found that the MJO does alter ocean circulations on intraseasonal time scales, notably the result of Moum et al. (2014) found during DYNAMO. It remains to be seen whether these ocean dynamical effects of the MJO are of sufficient impact to affect S2S prediction skill. One advantage of the MLO is that we can nudge to climatological SSTs and so do not have climate drifts that challenge fully coupled models. Klingaman and DeMott (2020) found that climate models exaggerate the effect of ocean coupling on the MJO by overintensifying the MJO in El Niño years. S-SHiELD does not have a coupled dynamical ocean and nudges toward climatology and so can only represent the El Niño-Southern Oscillation state at

Journal of Advances in Modeling Earth Systems
initialization; indeed, the DYNAMO period was during a La Niña event (see https://origin.cpc.ncep.noaa. gov/products/analysis_monitoring/ensostuff/ONI_v5.php). Hence, this El Niño-Southern Oscillation contamination of the link between ocean coupling and the MJO is not present in S-SHiELD.
The diurnal cycle of precipitation is another challenge for climate models. Covey et al. (2016) found that nearly all climate models, even the 30-km resolution GFDL HiRAM, struggle with both the phase and amplitude of the diurnal cycle, especially over land and during boreal summer. Figure 16 presents the June-August (JJA) diurnal cycle from a 10-year S-SHiELD simulation with MLO SSTs nudged toward climatology, with results from 13-km SHiELD hindcasts shown for reference. We find that the observed phase of the diurnal cycle is beautifully matched by S-SHiELD, over both land and ocean. Most notably, the CONUS evening maximum of precipitation is reproduced. However, the amplitude of the cycle is biased low over land areas, possibly due to the inability of S-SHiELD's 25-km grid to produce the propagating mesoscale convective systems characteristic of heavier warm-season precipitation events. This appears to be a resolution effect as 13-km SHiELD reproduces both the correct phase and amplitude of precipitation. We also find that the majority of precipitation in S-SHiELD (55% globally and 80% between 20°S and 20°N) is from the SAS convective scheme, although this does not adversely affect the phase of the diurnal cycle. S-SHiELD does have the correct phase and amplitude (albeit slightly too high) of the diurnal cycle of the 2-m temperature over land ( Figure S3). Hagos et al. (2016) found that the diurnal cycle of cloudiness and precipitation plays a key role in the propagation of the MJO through the Maritime Continent. Since S-SHiELD has considerably better diurnal cycles of precipitation and temperature over land, especially over tropical land, than do most climate models, we might expect that this improved representation of the diurnal cycle may be contributing to the improved representation of the MJO seen above.

Conclusion and Prospects
We have developed the SHiELD modeling system as a research tool to demonstrate new capabilities of the FV3 Dynamical Core and of our physical parameterizations, develop new ideas in atmospheric prediction modeling, and to explore processes and phenomena within the atmosphere. Since late 2015 when FV3 was first coupled to the then-operational GFS Physics Driver, we have developed SHiELD into a promising vehicle for improving the prediction and understanding of atmospheric phenomena. SHiELD also demonstrates the potential and viability of unified modeling in which there is a single modeling system with one codebase, one executable, one preprocessor, one set of runscripts, and one set of postprocessing tools. This greatly simplifies the modeling suite and allows improvements to be exchanged between configurations.
The fundamental characteristics of SHiELD compared to previous-generation and existing operational models are documented in this and other publications. For some applications, we have previously demonstrated capabilities similar to that of existing modeling systems, such as severe-storm prediction in C-SHiELD  and tropical cyclone intensity prediction in T-SHiELD (Hazelton, Bender, et al., 2018;. We have shown significant improvements over existing models, especially over existing global models, for large-scale and hurricane prediction skill in 13-km SHiELD (Chen, Lin, Magnusson, et al., 2019;Zhou et al., 2019), and the diurnal cycle and MJO prediction in S-SHiELD. We have even shown entirely new possibilities for prediction modeling, such as skillful hurricane intensity forecasts in 13-km SHiELD , and the possibility of medium-range convective-scale prediction in C-SHiELD. Ultimately, the true strength of SHiELD is that all of these characteristics are demonstrated in the same modeling system.
SHiELD is designed to be an experimental research modeling system, with a particular set of scientific goals set by its developers, and thereby is more restricted in scope than the GFS, HAFS, RFS, and other general-purpose models intended for operational weather forecasting and to support broad audiences of users. While improved prediction skill is a major scientific goal and an important "vital sign" of model development, we also develop SHiELD as a means to demonstrate new modeling capabilities. SHiELD is also intended to be principally a physical atmosphere modeling system and is not intended for research into oceanic dynamics, decadal-to-centennial projection, biogeochemistry, or other topics taking place at either longer time scales or greater complexity than SHiELD is designed for. Improvements within SHiELD can be seamlessly transitioned into other FV3-based models that do address these topics, including other UFS models and the FV3-based coupled earth-system models at GFDL, within NASA, the National Center for Atmospheric Research, and elsewhere. As such, SHiELD's progress will continue to contribute to the development and improvement of these modeling systems. SHiELD is a part of GFDL's fourth-generation modeling suite (GFDL, 2019; Figures 1 and 2) and shares common infrastructure with CM4, Earth System Model version 4, and Seamless System for Prediction and Earth System Research. SHiELD uses a different physics suite and land model from the other GFDL configurations but otherwise is constructed similarly. Advances can then be exchanged between the configurations, allowing for mutual improvement, seamless cross-time scale modeling, and potentially unification of GFDL's weather and climate modeling efforts. A significant two-way interaction between SHiELD and other UFS configurations (GFS, HAFS, RFS, etc.) is taking and promises to continue driving furthered improvement of all UFS applications.
Further development of SHiELD, including both FV3 and the SHiELD physics, will continue to improve the prediction skill of the configurations, address issues which have been identified, and broaden the scope toward new applications. As computing power allows, models will be pushed to higher horizontal and vertical resolution, physical processes developed to improve simulation quality and prediction skill, and to address emerging scientific questions. New capabilities within FV3, including regional and doubly periodic domains, will permit efficient simulation of processes at kilometer and subkilometer scales for basic science and for process studies to improve physical parameterizations. We are also working on a native SHiELD data assimilation cycling system to take advantage of the new advances and to create initial conditions most consistent with the forward prediction model configurations. Finally, we will continue to develop our Tier-2 configurations, with near real-time S2S predictions being made using S-SHiELD, and continued extension into the global cloud-resolving regime (cf. Stevens et al., 2019) toward new scientific problems not adequately addressed by existing regional models or by coarse-resolution global models.

Appendix A: PD Advection Scheme
The Lagrangian dynamics in FV3 uses 1-D advection operators to build the 2-D advection scheme of Lin and Rood (1996). In hydrostatic FV3, these operators are typically monotonic (Lin, 2004), in that no new extrema are created by the advection; however, monotonic advection can be overly diffusive for some applications. In nonhydrostatic FV3, the monotonicity constraint is not used for advection of dynamical quantities (vorticity, heat, and air mass), but positivity still needs to be enforced for scalar tracers. We introduce a PD scheme, which uses a weaker constraint than monotonicity which only prevents the appearance of negative values.
This positivity constraint can be applied to any scheme similar to Van Leer (1974) or the Piecewise-Parabolic Method (PPM; Colella & Woodward, 1984) in which first-guess continuous edge values b q i þ 1=2 and b q i − 1=2 are interpolated from the cell-averaged values q i where i is a grid index. As with a standard monotonicity constraint, we break the continuity of the subgrid reconstructions across grid-cell interfaces, creating left-edge and right-edge values, Q − i and Q þ i , respectively, as well as a curvature value B oi for each grid cell, which are then used to compute the flux as in Putman and Lin (2007, Appendix B).
To adjust the edge values to ensure positivity, we use the algorithm below on cell i, where notation is as in Lin (2004, Appendix A): SHiELD grew out of a major collaboration between GFDL and EMC and would not have been possible without the physical parameterization suite, software, data, and especially input initial conditions and baseline forecasts made freely available by EMC and the National Weather Service. We thank Jongil Han for providing SA-SAS and George Gayno and Helin Wei for providing EMC preprocessing tools and land model inputs and for significant assistance with these tools and data sets. We also thank James Franklin (NHC, retired) for advice on the accuracy of the wind radii in the Best