A Machine Learning Bias Correction on Large‐Scale Environment of High‐Impact Weather Systems in E3SM Atmosphere Model

Large‐scale dynamical and thermodynamical processes are common environmental drivers of high‐impact weather systems causing extreme weather events. However, such large‐scale environmental conditions often display systematic biases in climate simulations, posing challenges to evaluating high‐impact weather systems and extreme weather events. In this paper, a machine learning (ML) approach was employed to bias correct the large‐scale wind, temperature, and humidity simulated by the atmospheric component of the Energy Exascale Earth System Model (E3SM) at ∼ 1° resolution. The usefulness of the ML approach for extreme weather analysis was demonstrated with a focus on three high‐impact weather systems, including tropical cyclones (TCs), extratropical cyclones (ETCs), and atmospheric rivers (ARs). We show that the ML model can effectively reduce climate bias in large‐scale wind, temperature, and humidity while preserving their responses to imposed climate change perturbations. The bias correction is found to directly improve water vapor transport associated with ARs, and representations of thermodynamical flows associated with ETCs. When the bias‐corrected large‐scale winds are used to drive a synthetic TC track forecast model over the Atlantic basin, the resulting TC track density agrees better with that of the TC track model driven by observed winds. In addition, the ML model insignificantly interferes with the mean climate change signals of large‐scale storm environments as well as the occurrence and intensity of three weather systems. This study suggests that the proposed ML approach can be used to improve the downscaling of extreme weather events by providing more realistic large‐scale storm environments simulated by low‐resolution climate models.


Introduction
General circulation models (GCMs) are the most common approach used in projecting climate change including future changes in high-impact weather systems such as atmospheric rivers (ARs), tropical cyclones (TCs), and extratropical cyclones (ETCs) which have substantial societal and economic impacts (e.g., Angélil et al., 2016;Dai & Nie, 2022;Guan & Waliser, 2017;Merz et al., 2020;Moon et al., 2018;Seneviratne et al., 2012Seneviratne et al., , 2023;;Wehrli et al., 2018).However, achieving a proper representation of these weather systems for hazard assessment requires high spatial resolutions (on the order of a few kilometers) to realistically simulate the storm processes (e.g., convection), which is computationally demanding for global modeling (e.g., Kanada et al., 2017;Kanada & Wada, 2016;Kendon et al., 2014;Kitoh & Endo, 2016;Lucas-Picher et al., 2021;Mori et al., 2019;Nie et al., 2018;Willison et al., 2013).Consequently, downscaling techniques have been widely used in combination with GCMs at low resolution (on the order of hundreds of kilometers) to yield important scientific insights on past and future changes of high-impact weather events (e.g., Emanuel, 2013;Emanuel et al., 2006;Knutson et al., 2013Knutson et al., , 2019Knutson et al., , 2020;;Lee et al., 2020).Downscaling approaches rely on the large-scale storm environments simulated by GCMs to project future changes of high-impact weather events through established statistical relationships (e.g., Balaguru et al., 2023;Colle et al., 2015;Dixon et al., 2016;Emanuel, 2013), or to provide boundary conditions for limited area or regional models to simulate the local climate and high-impact weather events (e.g., Fu et al., 2005;Giorgi et al., 1994;Gutowski et al., 2020).Accurate simulation of the large-scale storm environments by GCMs is therefore essential to achieve reliable downscaling to evaluate future changes in frequency, intensity, and characteristics of high-impact weather events.However, the large-scale storm environments governing regional to local-scale weather systems are often not well represented in the GCMs due to varying levels of systematic biases and uncertainties in representing smaller-scale processes that interact with the large-scale environments (e.g., Collins et al., 2013;Flato et al., 2013;Mueller & Seneviratne, 2014;Volosciuk et al., 2015;Zappa et al., 2013).As a result, GCM bias corrections have been an important research topic in downscaling studies and many bias correction methods have been developed to provide more reliable regional climate information (e.g., Christensen et al., 2008;Deque, 2007;François et al., 2020;Gudmundsson et al., 2012;Vaittinada Ayar et al., 2021;Vrac et al., 2012;Z. Xu & Yang, 2012;Z. Xu et al., 2021).Studies have demonstrated that correction of the GCM mean bias may improve dynamical downscaling of local-scale high-impact weather systems such as tropical cyclones over the North Atlantic Ocean (e.g., Bruyère et al., 2014;Done et al., 2015).
In recent years, advances in machine learning (ML) techniques have enabled the application of modern artificial neural network architectures in bias correction and statistical downscaling of GCMs (e.g., Fulton et al., 2023;Han et al., 2021;Moghim & Bras, 2017;Steininger et al., 2020;W. Xu et al., 2021;F. Wang & Tian, 2022).Several types of ML approaches have proven to successfully reduce spatial and temporal biases in GCMs (Fulton et al., 2023;F. Wang & Tian, 2022).In this paper, we introduce a long short-term memory neural network (LSTM) machine learning (ML) approach (Barthel Sorensen et al., 2024;Charalampopoulos et al., 2023) developed to bias correct the climate simulations produced by the U.S. Department of Energy (DOE) Energy Exascale Earth System Model (E3SM, Golaz et al., 2022).Specifically, the developed ML approach is employed to postprocess the large-scale wind (U, V), temperature (T), and humidity (Q) from long-term present-day and future climate simulations conducted with version 2 of the E3SM Atmosphere Model (EAMv2) at a horizontal grid spacing of ∼1°, driven by prescribed sea surface temperature and sea ice as lower boundary conditions.With an ultimate goal of improving the modeling of extreme weather events, we evaluate the impact of ML bias correction on large-scale storm environments and the long-term statistics of the high-impact weather systems simulated by EAMv2, with a focus on the ARs, TCs, and ETCs that frequently cause extreme weather events over the globe.Importantly, we also evaluate the impact of the ML bias correction on the responses of the three highimpact weather systems to future climate change.The goal of this study is to determine how well low-resolution climate models with ML bias correction may provide more skillful simulations of large-scale storm environment conditions than those without bias correction, thus providing a path to improved assessments of extreme weather events.
In Section 2, we introduce the experimental design and ML bias correction method.In Section 3 we evaluate the impacts of the ML model on the GCM model biases and the GCM projected climate change signals in large-scale model states (i.e.U, V, T, Q).Then the long-term statistics of ARs, TCs, and ETCs and their associated large-scale storm environment with and without ML bias correction are compared and evaluated for the present-day climate  (Section 4).Section 5 presents the impact of ML bias correction on the responses of ARs, TCs, and ETCs to climate change projected by pseudo global warming (PGW) simulations with and without postprocessing by the ML bias correction.Lastly, conclusions and discussions are given in Section 5.

A Brief Overview of E3SM Atmosphere Model (EAM)
E3SM is a global Earth system model developed by the U.S. Department of Energy (Leung et al., 2020) with the first version released in 2018 (Golaz et al., 2019).This study uses the E3SM Atmosphere Model version 2 (EAMv2, Golaz et al., 2022) at standard resolution (also referred to as the "low-resolution" configuration).In brief, EAMv2 uses separate computational grids for dynamics and column physics parameterizations.The dynamical core is configured with the "np4" cubed-sphere mesh with a horizontal resolution of ∼110 km to solve the equations for large-scale dynamics and tracer transport (e.g., Dennis et al., 2012;Taylor & Fournier, 2010).The column parameterizations are run on a "pg2 grid" which shares the element grid with the dynamics but has a 2 × 2 subgrid of quadrilaterals for a total of four columns per element (e.g., Herrington et al., 2019;Lauritzen et al., 2018).The key subgrid-scale physical parameterizations in EAMv2 include representations of deep convection (G.J. Zhang & McFarlane, 1995), turbulence and shallow convection (Golaz et al., 2002;Larson et al., 2002), cloud microphysics (Gettelman & Morrison, 2015;Morrison & Gettelman, 2008;Y. Wang et al., 2014), aerosol life cycle (Liu et al., 2016;H. Wang et al., 2020), and shortwave and longwave radiation (Iacono et al., 2008;Mlawer et al., 1997).In addition, EAMv2 is interactively coupled with a land model (Oleson et al., 2013) that uses the same "pg2" grid for column parameterizations.EAMv2 is configured with 72 vertical layers, extending from the Earth's surface to ∼0.1 hPa (∼64 km).The vertical grid spacing is uneven, with the layer thickness ranging typically from 20 to 100 m near the surface and up to 600 m near the model top.

Model Simulation
The simulations conducted for this study use prescribed sea surface temperatures and sea-ice concentrations, following the Atmospheric Model Intercomparison Project protocol (AMIP, Gates et al., 1999).Table 1 summarizes the key configurations of these E3SMv2 simulations.The first group (Group 1) consists of one baseline simulation (i.e., "CLIM") and two PGW simulations (i.e., "SSP245" and "SSP585").CLIM is a present-day free-running simulation driven by prescribed observed monthly mean sea surface temperature (SST) and sea ice concentration (SIC) from the input4mip data sets (Reynolds et al., 2002) as the lower boundary conditions.Other external forcing data, including volcanic aerosols, solar variability, concentrations of greenhouse gases, and anthropogenic emissions of aerosols and their precursors, were prescribed following the World Climate Research Program Coupled Model Intercomparison Project-Phase 6 (CMIP6, Eyring et al., 2016;Feng et al., 2020;Hoesly et al., 2018).Emissions of aerosols and their precursor gases were set to the values of the year 2010 to represent the present-day condition.The simulation was run from 1 January 1978 to December 2014.The first year of model output was discarded as model spin-up, and the remaining 36 years of model output were used for analysis.
The SSP245 and SSP585 are two EAMv2 simulations conducted with the PGW approach that have been widely used in climate modeling (e.g., Schär et al., 1996;Xue et al., 2023).The PGW approach has been proven as a useful experiment strategy that enables targeted exploration of regional impacts from future climate change while avoiding the large ensembles typically required to address internal variability (Xue et al., 2023).In a PGW experiment, the large-scale changes in the climate system were imposed on a control climate simulation by modifying the boundary conditions.In this study, the imposed climate change perturbations were added to the boundary conditions of SST and SIC for CLIM that represent present-day climate conditions.Specifically, • SSP245: the patterned SST and SIC perturbations (i.e.Δ) associated with the Shared Socioeconomic Pathways 2-4.5 scenario were added to the SST and SIC boundary conditions used in CLIM.Specifically, the monthly mean SST and SIC model outputs during the 1991-2010 (present-day) and 2041-2060 (future climate) periods were extracted from the coupled simulations conducted with 15 CMIP6 models (See Table A1).The climatological mean of the monthly SST and SIC over the two periods were then computed by averaging each quantity over the 20 years, and the ΔSST and ΔSIC were derived as the difference between the present-day and future climatological mean SSTs and SICs for each month and each grid point.Finally, the multi-model ensemble mean (MME) of ΔSST and ΔSIC were computed and added to the SST and SIC that were prescribed in the CLIM simulation as climate change perturbations for the pseudo SSP245 global warming experiment.We note that the monthly mean ΔSST and ΔSIC were added to each corresponding month to preserve the monthly and seasonal cycles of ΔSST and ΔSIC.Overall, the SSP245 perturbations correspond to a 1-2 K Note.The three simulations in Group 1 were conducted with the coupled atmosphere-land components of EAMv2 with default "low-resolution" configuration at ∼110 km grid spacing (see Golaz et al., 2022).All simulations were conducted with prescribed sea surface temperature (SST) and sea ice concentration (SIC) (see context in Section 2.2 for details).Group 2 consists of three simulations that apply ML bias corrections to the corresponding simulations in Group 1.Here "pseudo 1979-2014" refers to simulations driven by SST and SIC with climate perturbations corresponding to the difference between (2014-2060) and (1991-2010) added to the SST and SIC of the CLIM period of 1979-2014.warming in the annual mean SST, and 5 10% reduction in the annual mean SICs in most regions over the globe compared to the observed SST and sea ice in the present-day climate (see Figures A1a and A1b).• SSP585: same as SSP245, except that the CMIP6 coupled simulations conducted under the Shared Socioeconomic Pathways 5-8.5 scenario during 2041-2060 were used to derive the SST and SIC perturbations for the PGW experiment.Compared to the SSP245, SSP585 with an end-of-century forcing of 8.5 W m 2 instead of 4.5 W m 2 , results in stronger warming of SSTs and larger reductions of SICs (see Figures A1c  and A1d).
We note that the two PGW simulations, SSP245 and SSP585, used the same configurations as for CLIM, except with imposed climate change perturbations added to the boundary conditions of SST and SIC.Therefore, the differences between SSP245 (SSP585) and CLIM were derived in this study to represent the model responses (e.g., large-scale storm environment and statistics of extreme events) to the climate-change-induced SST and SIC perturbations.

Machine Learning Bias Correction
The second group (Group 2) consists of three simulations that are the same as Group 1 except that the machine learning (ML) model was used to post-process the three simulations in Group 1 to bias correct the EAMv2 simulations of the present-day and future climates.In brief, the ML model employed for bias correction was proposed by Barthel Sorensen et al. (2024) in which a neural network (NN) operator acts on the coarse-resolution climate model simulation in a postprocessing manner.The NN operator was trained to learn a map between a coarse resolution EAMv2 historical simulation and the ERA5 reanalysis data (Hersbach et al., 2020) that represents the real atmosphere, allowing for the correction of large-scale dynamics of the EAMv2 model.A full description of the mathematical framework for the ML model is included in Appendix A2.Specifically, the ML training input was provided by a nudged simulation instead of using arbitrary coarse trajectories from the freerunning simulations (i.e., CLIM).Here, the nudged simulation was conducted with the same configuration of EAMv2 as in CLIM, except that the model state (i.e.U, V, T, Q) was constrained toward the ERA5 reanalysis that has been remapped to the EAMv2 horizontal grid following S. Zhang et al. (2022) and Sun et al. (2019).With nudging the trajectory of the model state, which predominately obeys the dynamics of the coarse resolution EAMv2 model, is constrained from systematically as well as chaotically diverging from the ERA5 reanalysis.To counteract the artificial dissipation introduced by the nudging tendency, the spectrum of the nudged solution is corrected to match the free-running EAMv2 model (see Equations A2-A5 in Appendix A2).Training on this specific pair of trajectories (the spectrally corrected nudged EAMv2 solution and the ERA5 data) allows the network to learn a map from the chaotic attractor of the coarse resolution EAMv2 model to that of the reference data (i.e., ERA5 reanalysis) without being corrupted by chaotic divergence.At test time, the correction operator is then applied to the output of free-running EAMv2 simulation which is mapped into a trajectory residing in the attractor of the reference data.More detailed information for the development of the ML model can be found in Barthel Sorensen et al. (2024).
The performance of the ML model in correcting large-scale model states, including U, V, T, and Q, has been verified in Barthel Sorensen et al. (2024) using EAMv2 historical simulations from 1979 to 2014.It is found that the ML model can correct the coarse E3SM output to closely reflect the 36-year ERA5 statistics for all prognostic variables and significantly reduce their spatial biases (see Section 4 of their paper).In this study, the same ML model was applied to bias correct the 3-hourly instantaneous U, V, T, and Q model output at each grid point and 72 model levels from the three simulations in Group 1.In this study, the pair of EAMv2 simulations with and without ML bias correction during the historical period of 1979-2014, that is, CLIM and ML (CLIM) are the same as those verified in Barthel Sorensen et al. (2024).In addition, we further applied the ML bias correction to the PGW simulations for future climate, which is referred to as ML (CLIM), ML (SSP245), and ML (SSP585) in Table 1 and throughout the rest of this manuscript.Here, no new training was carried out for the bias correction of future scenarios.Therefore, the implied hypothesis here is that the bias correction model trained using the historical simulation can be applied to correct similar biases in future climate scenarios.Further analyses and evaluations in this regard will be presented in Section 3.2.
Overall, different from Barthel Sorensen et al. (2024) focusing on the development and assessment of the ML methodology, this paper focuses on presenting a detailed comparison between the Group 1 and Group 2 simulations to quantify the effects of the ML bias correction on the EAMv2 simulations of mean climate, extreme events, and climate change signals, which will be presented in Sections 3-4.

Analysis Strategy
As discussed above, the current ML bias correction model was only designed to correct the model state variables of U, V, T, and Q in the EAMv2 model output to improve the large-scale atmospheric flows.Therefore, the discussion in Sections 3-4 will rely on the metrics that can be derived from these model state variables through offline diagnostic equations or post-processing diagnostic package (see Appendix B).Other climate variables such as precipitation can be valuable for climate communities and impact studies, but they have not been included in our ML bias correction yet and therefore will not be discussed in this study.The limitations of our ML bias correction approach will be further discussed in Section 5.
In addition, the raw data sets from the simulations in Table 1 were the 3-hourly output of U, V, T, Q during 1979-2014 on the EAMv2 model grid as mentioned in Section 2.1.The data were post-processed into a 1.5°× 1.5°n ormal lat-lon grid with the bilinear interpolation to facilitate the convenience of analysis in this study.A similar remapping process was also applied to the ERA5 reanalysis data for comparison and the derivation of model biases metrics in Sections 3-4.All quantities derived from U, V, T, and Q, as discussed in Appendix B, were diagnosed or processed using the 3-hourly data.The average of all 3-hourly data samples at each month was computed to form the monthly mean data sets during 1979-2014, and these monthly mean data sets will be further used to derive the seasonal and annual means for the diagnostic figures in Section 3.Moreover, the diagnostic metrics in Section 4 for TCs, ETCs, and ARs were generated with the algorithms and functions provided by the TempestExtremes package (P. A. Ullrich et al., 2021).The input to TempestExtreme is 3-hourly data derived from the U, V, T, and Q model output.The detailed configuration used by our study for TempestExtremes is described in Appendix B. A similar approach as mentioned above was used to process the high-frequency diagnostics from TempestExtreme into monthly, seasonal, or annual mean metrics for discussions in Section 4. Extra information for plotting details was also added to the captions under each figure in Sections 3-4.
Finally, the two-tailed Student's t-test was employed for generating the significance test on the 2-D plots (stipplings overlaid on the figures) in Sections 3 and 4. The significance test in our study was performed in a consistent manner, namely, we first processed the annual mean values at each grid point of the 2-D map over 36 years from 1979 to 2014, and then we calculated the sample average and variances from the 36 annual mean values and generated the probabilities for the Student-t test.Grid points with probability above the critical significance level of 0.1 or 0.05 (provided in the captions under the specific figure) are dotted to indicate that the means are statistically different at these grid points.While more rigorous statistical tests (e.g., Wilks, 2016) may be used to determine the significance of the differences, the Student's t-test results in combination with our understanding of the physical processes represented in E3SM are helpful for interpreting the differences between the results from EAMv2 simulations with and without ML bias correction.

Bias Correction on Historical Simulation
A thorough evaluation of the performance of ML bias correction in improving the mean climate statistics was detailed in Barthel Sorensen et al. (2024).In this section, we present a brief evaluation of the ability of the ML approach to reduce the mean climate biases in EAMv2, which will be used to facilitate our discussion in the next subsection and Section 4. Figure 1 shows the zonal mean and annual mean biases in the zonal wind (U), temperature (T), and specific humidity (Q) fields of the EAMv2 simulations without and with ML bias correction.The metrics were derived by comparing CLIM and ML (CLIM) in Table 1 with the ERA5 reanalysis during the 1979-2014 period.The biases in the U, T, and Q fields from ML (CLIM) (Figures 1b, 1e, and 1h) are systematically smaller than those in CLIM (Figures 1a, 1d, and 1g) over most regions and vertical levels, meaning that the ML model leads to promising bias reduction in the mean climate fields simulated by EAMv2.We also note that the performance of ML bias correction varies with the quantities and spatial locations.Compared ML (CLIM) to CLIM, more significant bias corrections are seen in the temperature (Figure 1d versus Figure 1e) and humidity fields (Figure 1g versus Figure 1h), while relatively weaker bias reductions are seen in wind fields (Figure 1a versus Figure 1b).In addition, more promising bias corrections are seen in the near-surface levels relative to the upper model levels in most regions over the globe (also see Figure C1 in Appendix), especially for the wind fields (Figure 1a versus Figure 1b).Further investigation on the profiles of the global and annual mean standard deviation of U, T, and Q fields (Figures 1c,1f,and 1i) from ERA5 reanalysis shows the clear vertical structure in the variability of these fields.In particular, the standard deviation of U (Figure 1c) in the upper troposphere (10 15 m s 1 ) is about 3-5 times larger than those at near-surface levels (3 5 m s 1 ).The differences in variability with altitude in these fields could limit the skill of the ML bias correction model at those levels.
For a more quantitative evaluation, Figure 2 further shows the mean biases and root-mean-square error (RMSE) of selected physical quantities from CLIM and ML (CLIM).The mean biases are normalized by the observed values, while the RMSE is normalized by the RMS of the observed values to demonstrate the relative rank of the biases in Zonal and annual mean model biases in the zonal wind (U, unit: m s 1 , panels a-b), temperature (T, unit: K, panels d-e), and water vapor mixing ratio (Q, unit: g kg 1 , panels g-h) averaged from 1979 to 2014.Shown are the results from free-running (i.e., CLIM, first column) and ML bias-corrected (i.e., ML (CLIM), second column) EAMv2 simulations.The biases are derived by comparing annual mean quantities between EAMv2 simulations with ERA5 reanalysis.The dotted region indicates that the differences are significant at a 95% confidence level from a Student's t-test.The third column shows the standard deviation of U, T, and Q at each pressure level from the ERA5 reanalysis.The log-linear algorithm is used to interpolate the EAMv2 data on the hybrid sigma-pressure level to the pressure level for comparison with ERA5 reanalysis.Details of the simulation setups can be found in Table 1.different variables.We can see that the ML bias correction effectively reduces the biases by 10%-20% in the wind, temperature, and humidity fields over the globe (Figure 2a), especially in the mid-latitude regions.Meanwhile, the global and regional patterns of large-scale wind, temperature, and humidity fields are also systematically improved as evidenced by the RMSE metrics (Figures 2c and 2d).With these results, we conclude that the ML approach is capable of reducing biases in the mean climate simulated by EAMv2, which may produce a more realistic representation of the large-scale dynamics and thermodynamics fields associated with extreme events.

Impact of Bias Correction on Mean Climate Change Signals
As discussed in Section 2.2, the ML model trained with EAMv2 nudged simulations and ERA5 reanalysis during the historical period is directly used to correct the EAMv2 simulations for historical and future climate scenarios.Unlike many previous studies (e.g., Chen et al., 2020Chen et al., , 2021;;Gutiérrez et al., 2019;Teutschbein & Seibert, 2012) which assumed that the bias correction was identical in present-day and future climates, our ML bias correction does not assume that the biases in climate model simulations are independent of the mean climate states.In other words, we do not assume that the error correction terms computed in the present-day climate can be simply added to the future runs.It is likely, however, that the large-scale structure and magnitude of model biases are very similar between the climates of the two time periods.If this is true, the bias correction should not significantly interfere with the large-scale climate change signals resulting from the imposed perturbations for the PGW simulations.We demonstrate that this is the case for our employed ML model by checking the features of the large-scale climate change signals before and after the ML bias correction in this section.
Figure 3 shows the spatial distribution of the temperature changes at 850-hPa due to the imposed climate change perturbations in SST and SIC for the SSP2-4.5 (top row) and SSP5-8.5 (bottom row) future scenarios.The patterns and magnitude of changes in near-surface temperature comparing the PGW and CLIM simulations (Figures 3a and 3c) are largely consistent with the PGW perturbations of SST (Figures A1a and A1c).These responses are expected due to the direct impact of the prescribed SST perturbations on the temperatures in the lower atmosphere.Compared with the climate change signals without bias correction (Figures 3a and 3c), ML bias correction applied to CLIM and the PGW simulations overall does not significantly modify the patterns and ] and [30-60S] latitude bands, panel c) from EAMv2 simulations without (i.e., "CLIM" in red) and with (i.e.,"ML (CLIM)" in blue) ML bias correction, normalized by the observed value (i.e., ERA5 reanalysis); Second row: same as the first row, but for root-mean-square errors (RMSE) of anomaly patterns between EAMv2 simulations and observations, normalized by the root-mean-square (RMS) of the observed values.All metrics are calculated using the monthly mean model output and ERA5 reanalysis (i.e., observation) from 1979 to 2014.The y-axis shows the selected physical quantities, including surface pressure (PS, unit: hPa), sea level pressure (PSL, unit: hPa), zonal wind (U, unit: m/s) and temperature (T, unit: K) at bottom model level, 850-, 500-and 200-hPa pressure levels, as well as the specific humidity (Q, unit: g/kg) at 925-, 850-, 500-and 200-hPa pressure levels.The log-linear interpolation is used to regrid the EAM model output on the hybrid sigma-pressure level to pressure-level, and compared with ERA5 reanalysis on pressure levels.Details of the simulation setups can be found in Table 1.magnitude of 850-hPa temperature responses in most regions over the globe (Figures 3b and 3d).An exception is the regions around 30°S and 30°N where a moderate modification on the magnitude of 850-hPa temperature changes is observed (e.g., Figure 3c versus Figure 3d).Further analysis indicates that the ML model applies corrections to the large cold temperature biases over these regions (see.Figures 1d-1f) during the historical period.Such corrections from the ML model are expected to take effect in the SSP245 and SSP585 simulations as well.Through non-linear processes in the atmosphere, the moderate adjustment of the climate change signals in these regions is not unexpected.
To further study the correction patterns between the present-day climate and future scenarios, the probability density functions (PDFs) of monthly temperature and humidity at 850 hPa during 1979-2014 are also plotted and shown in Figure 4. Consistent with the results in the previous section, we can see that the ML bias correction adjusts the PDF of CLIM in the present-day climate toward the PDF of ERA5 reanalysis data (dashed and solid blue lines vs. gray bars) for both near-surface temperature and humidity fields.For future climate scenarios, consistent with the imposed positive radiative forcing associated with the SSP scenarios, both EAMv2 simulations with and without ML bias correction predict warmer 850-hPa temperatures and higher specific humidity relative to present-day conditions (red lines vs. blue lines).Therefore, the physical climate change effects, that is, shifting toward warmer temperatures, are not erroneously removed as systematic biases by the ML model.More importantly, the differences between the ML-corrected future scenarios and present-day climate (solid blue and red lines) are very similar to those between the uncorrected data sets (dashed blue and red lines, see also Figure C2), suggesting that the ML bias correction does not significantly interfere with the PGW-induced climate change signals.In fact, we note that the PDFs of T850 and Q850 after ML bias correction (i.e., ML (SSP245) and ML (SSP585), solid red lines) are also quantitatively closer to the ERA5 global statistics (gray bars), compared with the uncorrected free-running simulations (i.e., SSP245 and SSP585, dashed red lines).This implies that the -CLIM for EAMv2 simulations without ML bias correction, as well as (b) ML (SSP245)-ML (CLIM) and (d) ML (SSP585)-ML (CLIM) for EAMv2 simulations with ML bias correction.The dotted regions indicate the differences are significant with a 95% confidence level.The SSP245 and SSP585 denote two PGW EAMv2 simulations with imposed climate change perturbations in sea surface temperature (SST) and sea-ice concentrations (SIC) derived from the CMIP6 historical simulations following SSP2-4.5 and SSP5-8.5 future scenarios, respectively.Detailed description on simulations can be found in Table 1.
ML bias correction constrains the EAMv2 simulations with corrections of the same sign and similar magnitude in both present-day and future climate simulations.For instance, the near-surface humidity correction shifts the right-side tail of the distribution by a similar amount to that seen in the present-day results (Figures 4b and 4d).Overall, it is encouraging that the ML model effectively reduces the model biases in large-scale dynamical and thermodynamical atmospheric conditions, while introducing insignificant interference on the climate change signals (or preserving the climate change signals imposed from external forcing).

Impact of Bias Correction on Statistics of Extreme Weather Events
In this section, we further discuss the value of ML bias correction for the study of extreme weather events and their underlying processes in EAMv2.We selected three types of high-impact weather systems to analyze: atmospheric rivers (ARs), ETCs, and tropical cyclones (TCs).The evaluation metrics rely on feature tracking using the TempestExtremes package (P. A. Ullrich et al., 2021), and are detailed in Appendix C.These three types of systems are of interest partly because they have the potential to generate extreme weather events, and they operate at spatial and temporal scales that are largely resolved (e.g., ARs), or under-resolved (e.g., ETCs and TCs) by lowresolution climate models (e.g., EAMv2) at ∼1°horizontal resolution.With analyses of these events, we aim to demonstrate the value of the ML bias correction for improving simulations and projections of extreme weather events by typical GCMs.

Atmospheric Rivers (ARs)
Atmospheric rivers (ARs) are characterized by intense moisture transport, which, upon landfall, can produce precipitation that can be both beneficial and destructive (Payne et al., 2020).This is because the precipitation rate is proportional to the convergence of the zonal and meridional moisture transport (model output fields TUQ and TVQ, respectively) associated with ARs (e.g., Mo et al., 2021).The major features of ARs are reflected by TUQ Long-term statistics for monthly mean air temperature (first column, unit: K) and specific humidity (second column, unit: g kg 1 ) at 850-hPa pressure levels over the whole global domain during the simulation period of 1979-2014.Shown is the comparison among ERA5 reanalysis (gray bars), uncorrected (dashed lines), and ML-corrected (solid lines) EAMv2 simulation for present-day (blue lines) and future climate scenarios (red lines).The future climate simulations with SSP2-4.5 (top row) and SSP5-8.5 (bottom row) perturbations are shown in the top and bottom rows, respectively.The detailed descriptions on simulations can be found in Table 1. and TVQ, which, as in Equation B5, are directly linked to large-scale wind and specific humidity.Therefore, the AR systems are expected to be well simulated by EAMv2 at at ∼1°resolution.Despite the model being capable of resolving ARs, biases still exist in the simulated ARs.As reported in Kim et al. (2022), version 1 of the E3SM model overestimates the occurrence frequency and the water vapor transport of ARs.Therefore, it is worth checking whether EAMv2 with the ML bias correction can reduce these AR biases.Here, TempestExtremes is employed to track ARs using the 6-hourly TUQ and TVQ fields derived from the EAMv2 simulations with and without ML bias correction (see detailed tracking algorithm in Appendix B).The occurrence frequency of ARs and vertically integrated horizontal water vapor transport (IVT) associated with ARs are then calculated and shown in Figure 5.We can see that the significant overestimation of IVT in E3SM v1 still exists in the EAMv2 model (Figure 5b), meaning that spurious large moisture transport associated with ARs persists in both versions of the EAM model.These model biases can introduce biases in the AR-driven precipitation in model simulation as found in previous studies (Kim et al., 2022).
With the ML bias correction, the spurious large moisture transports associated with ARs are significantly reduced in the EAMv2 simulations (Figure 5c).The remaining model biases in the composite IVT field are statistically insignificant in most regions over the globe.Following Equation B5, such improvements in ARs are obtained because of the effective bias reductions by the ML model in both large-scale wind and humidity fields (Figures 1h  and 1i).Consistently, the AR annual occurrence frequency also agrees better with ERA5 reanalysis after the ML bias correction (Figure 5e versus Figure 5f).Note that TempestExtremes uses the Laplacian of IVT instead of an IVT threshold for AR tracking.Therefore, biases in large-scale humidity on their own are not responsible for the AR frequency biases in Figure 5e.The improvements in the occurrence frequency of ARs suggest that the ML bias correction not only modifies the IVT value at each grid point but also inherently improves the gradient of IVT simulated by EAMv2. Figure 6 further shows the responses of the IVT and occurrence frequency of ARs to the climate change perturbations in SSTs and SICs used in the PGW EAMv2 simulations.The differences between SSP245 and CLIM suggest an increase of the IVT (Figure 6a) and the occurrence of AR events (Figure 6c), which can be explained by the higher atmospheric humidity associated with warmer temperature as shown in Figure 4.However, circulation changes such as changes in the jet stream and subtropical high-pressure systems (Kim et al., 2022) likely also play a role since as noted earlier, ARs are tracked based on the Laplacian of IVT instead of an IVT threshold so an increase in atmospheric humidity alone does not translate to more frequent AR occurrence.These climate change signals are also seen in the same pair of simulations with the ML bias correction (ML (CLIM) and ML (SSP245)), indicating that the bias correction preserves the climate change signals.Compared with the freerunning E3SM simulations, the ML bias correction results in a weaker increase of IVT and occurrence frequency of AR over the Northeast Pacific and Southern Ocean regions, which is likely due to the correction of ML on the overestimation of IVT in the E3SM model.Similar responses in the intensity (in terms of IVT) and occurrence frequency of ARs are also seen in the PGW simulations with stronger imposed climate changes in SST and SIC (i.e., SSP585), but the magnitudes of change in IVT and occurrence of ARs are more pronounced due to the stronger external forcing in SST and SIC (Figure C3).Again, the ML bias correction preserves the climate change signals, while adjusting the strength of the responses in IVT associated with ARs.Overall, the results in this section suggest that the ML bias correction reduces the systematic model biases in large-scale wind and humidity and improves the representation of ARs in EAMv2.Meanwhile, the ML bias correction does not have a significant impact on the climate change signals associated with ARs derived from the PGW simulations.By eliminating the systematic model biases, the bias-corrected AR environments provide more reliable information for downscaling ARs for assessing future changes in precipitation and flood hazards associated with ARs.

Extratropical Cyclones (ETCs)
Extra-tropical cyclones (ETCs) are a fundamental part of the atmospheric circulation that modulates the transportation of heat, moisture, and momentum in the mid-latitudes (Hawcroft et al., 2012;Sinclair et al., 2020).ARs discussed in the previous section are typically associated with a low-level jet stream ahead of the cold front of an ETC.The heavy precipitation and strong winds accompanying ETCs are known to cause extreme weatherinduced damages in midlatitude regions such as Europe and North America (Fink et al., 2009;Hoskins & Hodges, 2002).
We begin our discussion by showing the track densities of ETCs that are tracked with TempestExtremes using the 6-hourly sea level pressure (PSL) model output in the two hemispheres (see detailed tracking algorithms in Appendix B).The annual ETC storm tracks over the Northern Hemisphere (NH) in ERA5 reanalysis (Figure 7a) Figure 7. Track density maps for total annual ETCs over the Northern Hemisphere (NH, top row) and Southern Hemisphere (SH, bottom row) tracked in the ERA5 reanalysis (panels a, d) and EAMv2 climate simulations without (i.e,CLIM, panels b, e) and with (i.e., ML (CLIM), panels c, f) ML bias correction.The ETC events and composite IVT are tracked with the TempestExtremes using the 6-hourly mean sea level pressure (PSL) data from ERA5 reanalysis and EAMv2 simulations.The warmcore tropical-cyclone-like vortices were excluded during the feature tracking.The track densities shown are defined as the total number of time steps (6 hr) the ETCs passed the 8°× 8°grid over the globe at each year from 1979 to 2014.Units are the number of 6-hourly ETC occurrences per 8°× 8°grid box per year.
show very clear high track densities over two regions separated by orographic features: the first region extends from high topography in East Asia (i.e., the Tibetan Plateau and the Altai-Sayan-Stonovoy range) into the western North Pacific, while the second region extends from the lee of the Rocky Mountains in North America, across the North Atlantic into Scandinavia and northern Russia.Different from the NH, the annual ETC tracks in the Southern Hemisphere (SH) show more continuous features with the highest track densities between 50°S and 70°S (Figure 7d).The low-resolution EAMv2 model produces a good representation of the observed spatial patterns of ETC track densities in both hemispheres (Figures 7b and 7e).Compared with the free-running simulations (i.e., CLIM, Figures 7b and 7e), no significant differences in the ETC track densities are seen in simulations with the ML bias correction (i.e., ML (CLIM), Figures 7c and 7f).The small differences between ML (CLIM) and CLIM are likely because the corrections by the ML model on wind and temperature do not lead to significant adjustments on the derived PSL (see Equation B1).As shown in Figure C4, the systematic model biases of PSL in CLIM are less than 2-hPa in most regions over the globe (Figure C4b).Therefore, we will not expect a strong correction from the ML model in these regions with small PSL biases (Figure C4c).However, the CLIM simulation indeed reveals large low-pressure biases in the southern ocean region (50° 70°S) (Figure C4b), which is co-located with the highest ETC track density region over the Southern Hemisphere (Figure 7d).In the same region, we indeed see a reduction of maximum PSL biases in ML (CLIM) due to the correction by the ML model (Figure C4c).
The reasonable representation of ETC occurrence in both CLIM and ML (CLIM) enables a fair comparison of the large-scale storm environment associated with ETCs through feature-oriented composite analyses.Figure 8 shows the composited 850-hPa temperature field, along with analogously calculated composites of 850-hPa wind vectors (Figures 8a-8c) and IVT (Figures 8d-8f).ERA5 reanalysis clearly showed the "warm conveyor belt" feature (e.g., Dettinger et al., 2015) with the advection of warm and moist air wrapping Composites of meteorological quantities centered on ETC storm center of all filtered storms with mean sea level pressure (PSL) less than or equal to 990-hPa in the ERA5 reanalysis (first column) and the differences between EAMv2 simulations and ERA5 reanalysis before (i.e., CLIM, second column) and after (i.e., ML (CLIM), third column) applying ML bias correction.The top row shows the composite of air temperature (contour, unit: K) and wind (vector, unit: m s 1 ) at 850-hPa pressure level for (a) ERA5 reanalysis, (b) CLIM-ERA5 and (c) ML (CLIM)-ERA5; the bottom row shows the integrated vapor transport (IVT, unit: g kg 1 ) for (d) ERA5 reanalysis, (e) CLIM-ERA5 and (f) ML (CLIM)-ERA5.All ETCs tracked with 6-hourly PSL fields during 1979-2014 are included in the composite by filtering out the storms with centered PSL >990-hPa.The white (panels a-c) and black (panels d-f) cross markers indicate the center of ETC storms.
cyclonically around the eastern side of the storm center (Figures 8a and 8d).The CLIM simulation without ML bias correction shows systematic warm biases and spurious large water vapor transport around the storm center, suggesting an overestimation of the advection of warm and moist air associated with the ETCs in the model (Figures 8b and 8e).A sizable cold temperature bias is also pronounced on the north side of the composite storm (Figure 8b).For ML (CLIM) with the ML bias correction, the biases in the temperature, wind, and vapor transport are reduced in the composite storms (Figures 8c and 8f), with more significant improvements in the IVT fields (Figure 8f).This results in a more realistic advection of temperature and humidity associated with the ETCs in the EAMv2 simulations.We also notice that the ML bias correction produces a weakening of the westward wind around the storm center (Figure 8d) compared with those in CLIM.This seems to be a physical response as the corrections on the warm temperature bias around the storm center reduce the west-to-east temperature gradient featured in Figure 8b, leading to an adjustment of wind according to the thermal wind balance relationship.The results likely suggest that the ML model indeed makes physically meaningful corrections on the EAMv2 simulations.
The responses of the ETC track densities to the future climate change in the Northern Hemisphere (NH) are shown in Figure 9.The results suggest that climate change with warmer sea surface temperature and lower sea-ice concentrations leads to a reduction of the storm track density around the Arctic (Figure 9a).Stronger signals in the ETC track density responses are observed in the simulations with the higher-emission climate change scenario of SSP5-85 (Figure 9c).In addition, the signals of the ETC track density responses to the climate change from ML bias corrected simulation (ML (CLIM) (Figures 9b and 9d), highly agree with those from the freerunning EAMv2 simulations (CLIM, Figures 9a and 9b).Similar conclusions can be drawn for ETC track density responses over the Southern Hemisphere (SH, Figure C5).
For the responses of the ETC intensity, Figure 10 shows the changes of composited mean sea level pressure (PSL) fields in response to the imposed climate change perturbations from the SSP2-4.5 (first row) and SSP5-8.5 (bottom row) emission scenarios.The results suggest that global warming may favor more intense ETCs as there is a reduction of the storm center sea level pressure in the EAMv2 simulations of SSP2-4.5 (Figure 10a) and SSP5-8.5 (Figure 10c).Like tropical cyclones, the intensity of ETCs is not expected to be well simulated by the low-resolution EAMv2 model as it lacks the resolution to fully resolve the storm dynamics.However, analyses of the composite large-scale storm environment suggest that global warming leads to warmer temperatures on the west side of the storm and increased water vapor transport on the east side of the storm (contour and shading in Figures 11a and 11c).Meanwhile, there is an enhanced cyclonic circulation in the boundary layer regions (i.e., 850-hPa) due to climate change (vectors in Figures 11a and 11c).These changes in the storm environment suggest enhanced warm and moist air advection wrapping cyclonically around the storm center, favoring the development and formation of more intense ETCs, consistent with the sea level pressure changes.
The climate change signals in the ETC intensity and storm environment from the EAMv2 simulations with ML bias correction largely agree with those in the free-running PGW simulations for both SSP2-4.5 and SSP5-8.5.This was concluded by viewing the responses of ETC-composited PSL (Figure 10) and large-scale circulation and moisture transport at 850-hPa pressure level (Figure 11).Again, this suggests that the ML bias correction preserves the climate change signals associated with ETCs.However, different from the ETC track density, the ML bias correction shows noticeable impacts on the magnitude of the responses of the ETC intensity and storm environment to climate change.Specifically, the responses of PSL to the perturbations of SSP2-4.5 and SSP5-8.5 become weaker after applying the ML bias correction (Figures 10b and 10d), compared with those in free-running simulations (Figures 10a and 10c).Accordingly, a weaker change of the temperature and water vapor transport is also evidenced in the simulations with ML bias correction (Figures 11b and 11d versus Figures 11a and 11c).The modifications of the ML bias correction on the climate change signals are likely reasonable.As shown in Figure C1, the EAMv2 model significantly overestimates the humidity over the ETC active regions (e.g., 50-70°S), which likely explains the significant overestimation of the composite water vapor transport in Figure 8e.The effective corrections by the ML model on these biases tend to reduce the model-simulated humidity and water vapor transport associated with ETCs.Such corrections from the ML model are also expected to take effect in the ML (SSP245) and ML (SSP585) simulations.Also noteworthy is that the ML bias correction preserves the physical relationships between the storm environment and storm intensity, both showing smaller changes in the future compared to the changes simulated without bias correction.

Tropical Cyclones (TCs)
Tropical cyclones (TCs) are low-pressure systems that typically form in lower-latitude regions, which can cause some devastating and widespread geophysical hazards in the global tropics and subtropics.Previous studies have evaluated the frequency and distribution of TCs in an earlier version of the EAM model at ∼100 km resolution and found that the model significantly underestimates the occurrence frequency and intensity of TCs (Balaguru et al., 2020).The same conclusions can be drawn for the low-resolution EAMv2 simulations in our study (see Figure C6 in Appendix).Therefore, direct evaluation of TempestExtreme-derived metrics for TCs provides  panel d) for the EAMv2 simulations and with ML bias correction.All ETCs with mean sea level pressure (PSL) less than or equal to 990-hPa are included for the metrics.The ETCs are tracked with 6-hourly PSL fields from EAMv2 simulations from 1979 to 2014.The annual track density is defined as the total number of time steps (6 hr) the ETCs passed over an 8°× 8°grid box per year.The dotted region indicates that the differences are significant at a 90% significance level.limited value to conclude the impacts of the ML bias correction model.In this section, we instead focus on the evaluation of large-scale environmental conditions that are key drivers governing TC formation and development.These large-scale storm environments usually operate on the order of tens of thousands of kilometers that can be resolved by the EAMv2 model.
Figure 12 shows the climatological seasonal mean TC cyclone genesis potential index (GPI), potential intensity (PI), and vertical wind shear between 200 and 850 hPa.Here, the GPI and PI are defined using the large-scale vorticity, vertical wind shear, potential intensity, and humidity fields following Camargo et al. (2007) (also see discussion in Appendix B).The differences between CLIM and ERA5 in terms of the GPI and low-level wind shear (Figures 12b and 12h) are systematically reduced over most of the TC basins when the ML bias correction is applied to the U, V, T, and Q fields (Figures 12c and 12i).Although ML (CLIM) relative to CLIM produces larger biases in PI over the tropical Indian and western Pacific Ocean regions within 5°S 5°N,ML (CLIM) with bias correction (Figure 12f) effectively reduced the significant positive biases of PI in CLIM (Figure 12e) over the regions with maximum track densities (see Figure C6b).Overall, we conclude that ML bias correction shows the potential to improve the representation of large-scale storm environments associated with TCs in the lowresolution EAMv2.Further analysis from Equation B7 suggested that these improvements are possibly due to the bias reduction in large-scale U, V, T, and Q fields with the ML bias correction.In addition, the CLIM simulation, as shown in Figure 12h, features a noticeable overestimation of wind shear over tropical eastern Pacific and Atlantic ocean regions.This could also partly account for the significant underestimation of TC track densities over the Northeast Pacific and North Atlantic basin (Figure C6b versus Figure C6a) because the activity of the tropical easterly waves over tropical eastern Pacific and Atlantic oceans are known as key drivers for TC genesis.Interestingly, the biases in wind shear over these two regions in CLIM are significantly reduced in ML (CLIM) after applying the ML corrections (Figure 12i).
To demonstrate if the ML corrections on large-scale wind indeed lead to changes in TC activities, we employ the Risk Analysis Framework for Tropical Cyclones (RAFT, W. Xu et al., 2021) for a complementary assessment.The TC track model of RAFT is used to simulate TC track density given the climatological steering winds.10 but for responses (Δ) of temperature (blue contours, unit: K) and wind (black vectors) at 850-hPa pressure level as well as the vertically integrated vapor transport (IVT, unit: kg m 1 s 1 ) to climate change following SSP2-4.5 (first row) and SSP5-8.5 (second row) emission scenarios.See Table 1 for a detailed description of EAMv2 simulations.
Comparison of the TC track density produced by RAFT as driven by EAMv2 simulated steering winds with and without ML bias correction provides an assessment of the large-scale TC environment in the simulations.Following W. Xu et al. (2021), we used the 6-hourly large-scale wind fields (i.e.U and V) at 200-hPa and 850-hPa from ERA5 reanalysis, CLIM, and ML (CLIM) to generate three sets of synthetic TC tracks with RAFT, respectively.We hypothesize that the synthetic tracks should agree better with those obtained with ERA5 reanalysis if the ML bias correction improves the large-scale wind fields.
Figure 13 shows the annual mean TC track density over the Atlantic basin from the RAFT forecast.Compared to CLIM (Figure 13b), ML (CLIM) shows a better agreement with ERA5 reanalysis, with a clear reduction of track , CLIM-ERA5); Third column: the same as the second column but for the EAMv2 simulations with ML bias correction (i.e., ML (CLIM)-ERA5, panels c, f, i).The monthly mean model output from ERA5 reanalysis and EAMv2 simulations are used to calculate the GPI, PI, and vertical wind shear following Equation B7.For all panels, the seasonal mean values are computed for August to October in the Northern Hemisphere and for January to March in the Southern Hemisphere.Detailed description of EAMv2 simulations can be found in Table 1.
Figure 13.Annual mean TC track density over the Atlantic basin from RAFT forecast driven by the large-scale environmental wind fields from ERA5 reanalysis (panel a), and the differences in track densities between RAFT forecasts driven by ERA5 and EAMV2 simulations (panels b-c), and by EAMv2 simulations in present-day and pseudo-global warming scenarios (panels d-f), respectively.Panels (b-c) show the CLIM-ERA5 (panel b) and ML (CLIM)-ERA5 (panel c) for present-day EAMv2 simulations without and with ML bias correction, respectively.Panels (d-g) show the SSP245-CLIM (panel d) and SSP585-CLIM (panel f) for EAMV2 simulations without ML bias correction, as well as ML density biases (Figure 13c).The basin mean track density biases are reduced by more than 50% percent as shown by the numbers on the top right corner in Figures 13b and 13c).These results validate our hypothesis as discussed above, and demonstrate that the ML bias correction improves the RAFT TC track forecasts and thus the downscale analysis of the statistics of TC track densities.Similar track density forecasts from RAFT are also generated for EAMv2 PGW simulations with and without ML bias correction.As shown in Figures 13d and 13f, the climate-change-induced changes in the large-scale wind fields led to a significant increase in the number of TCs over the Atlantic Basin region, especially in the coastal regions over the eastern U.S.These climate change signals, which have been linked to the warmer SSTs over the eastern tropical Pacific Ocean under warming (Balaguru et al., 2023) (also seen in Figures A1a and A1b), are still seen in the simulations after applying the ML bias correction (Figures 13e and 13g), suggesting that the ML bias correction on the large-scale wind fields preserves the climate change signals and the associated TC track responses as seen in the free-running simulations.Moreover, we observe differences in the magnitude of the responses of TC track density over Eastern US coastal and Gulf of Mexico coastal regions before and after ML bias correction (Figures 13e and 13g versus Figures 13d and 13f).This reflects the impact of ML bias correction on the RAFT forecasts and the associated TC track responses through modifications on the large-scale wind fields in EAMv2 simulations.As the ML bias correction produces a more reliable representation of large-scale wind fields in EAMv2, higher confidence could be given to the results drawn from Figures 13e and 13g.Overall, the improved representation of large-scale storm environments (e.g., large-scale wind) by ML bias correction is beneficial for obtaining a more reliable downscaling of high-impact weather systems such as TCs.

Conclusions
Bias correction has been a commonly used approach when applying climate model outputs to impact studies.This study employed a machine-learning-based (ML) bias correction approach to improve the representation of the large-scale wind (U, V), temperature (T), and humidity (Q) in the climate simulations conducted with DOE's E3SM Atmosphere Model (EAM).The performance of the ML bias correction method in producing large-scale storm environments associated with high-impact weather systems is evaluated for both present-day (i.e., historical) and climate change scenarios.
Globally, the results show that the ML bias correction method performs well in reducing the overall biases in U, V, T, and Q fields from the climate model simulations.Compared with the wind fields, more promising corrections are found in the thermodynamical fields (i.e.T and Q), especially in the tropics and midlatitude regions and over the lower troposphere (see Figures 1 and C1).As reported in previous studies (S.Zhang et al., 2022), biases are more pronounced in these fields compared to the winds.Therefore, there is more room for these larger biases to be corrected during training for these fields compared to winds.When looking at the mean values (global and regional means), bias correction is very efficient at removing the biases in all fields at most model levels, with a systematic bias reduction of 10%-20% quantitatively (Figure 2).The same ML bias correction approach is then applied to process the PGW simulations from EAMv2 forced with the imposed climate change perturbations in sea surface temperature (SST) and sea-ice concentration (SIC) derived for the future climate scenarios of SSP2-4.5 and SSP5-8.5.The ML bias correction is found to constrain the probability distribution function (PDF) of the large-scale model state variable in historical simulations toward a better agreement with the observations.Similar shifting of the PDFs by ML bias correction is also seen in the PGW simulations, while the large-scale climate change signals of the model state (e.g., temperature and humidity) are well preserved before and after the ML bias correction (see Figure 4).This study further demonstrated the value of the employed ML bias correction in the assessment of high-impact weather systems that have the potential to generate extreme weather events.We used the model state of U, V, T, and Q with and without ML bias correction in the low-resolution EAM model to derive the long-term statistics, (SSP245)-ML (CLIM) (panel e) and ML (SSP585)-ML (CLIM) for EAMv2 simulations with ML bias correction.The SSP245 (or ML (SSP245)) and SSP585 (or ML (SSP585)) are two PWG simulations with imposed climate change perturbations in sea surface temperature and sea-ice concentrations derived for SSP2-4.5 and SSP5-8.5 future climate scenarios.More detailed descriptions of simulations can be found in Table 1.The 6hourly zonal wind (U) and meridional wind (V) at 200 and 850 hPa during 1979-2014 from ERA5 reanalysis and EAMv2 simulations are used to drive RAFT TC track forecasts following W. Xu et al. (2021).The annual track density is defined as the total number of 6-hourly tracks that pass through a 4°× 4°grid box per year.The gray dots in panels b and c indicate the differences are significant at a 95% confidence level.and evaluated the skills of bias correction in improving the model representation of the high-impact weather systems (e.g., occurrence frequency, intensity, and storm environment, etc.) in both present-day and PGW scenarios, with a focus on atmospheric rivers (ARs), ETCs and tropical cyclones (TCs).The results show that the large-scale vapor transport associated with ARs is more realistically represented in the bias-corrected data sets than those without bias correction, leading to a better representation of the occurrence frequency and the strength of ARs in the EAM model (see Section 4.1).Similarly, more realistic representations of ETC structure and ETCinduced changes in water vapor transport and thermodynamical flows are also obtained in the simulations with ML bias correction (see Section 4.2).When the ML bias-corrected large-scale winds are used to drive a TC track forecast model for downscaling analysis of TC activities over the Atlantic basin, the resulting TC track forecasts agree better with the results driven by observations (see Section 4.3).In addition, the ML bias correction does not significantly change the patterns of the responses of occurrence frequency and intensity of the three types of extreme events to pseudo-global warming effects, but there are obvious differences in the magnitude of the responses before and after the ML bias correction.Analysis of the ETC response to climate change shows that the ML bias correction preserves the physical relationship between the storm environment and storm intensity.Overall, the findings in this study suggest that the proposed machine learning bias correction is a useful approach to facilitate the downscaling of high-impact weather systems for low-resolution climate models by providing more realistic large-scale environment information.
While the proposed ML bias correction was demonstrated to be effective in the assessment of high-impact weather systems in the low-resolution EAMv2 climate models, some limitations of the current setup should be stated.First, as mentioned in Section 2.3, the ML model employed in this study for bias correction acts on the coarse-resolution climate model simulation in a postprocessing manner.The goal of such debiasing process was to bring the climate statistics of quantities predicted by the coarser-resolution numerical model into better agreement with the reference as quantified by the ERA5 data set.Therefore, our ML bias approach is not expected to correct biases in the dynamics of the model system.Nevertheless, the offline ML bias correction proposed in our study can be potentially combined with an online bias correction approach targeting direct corrections on the model dynamics to derive more optimal bias reduction for coarser-resolution climate model predictions.Second, the ML bias correction in this study was only applied to improve the statistics of large-scale model state variables, namely U, V, T, and Q fields.The ability of the ML approach to bias correct other physical quantities such as precipitation and radiative fluxes and to preserve the physical consistency and conservation laws of mass and energy has not been explored and discussed.However, the results of this study have demonstrated that our ML model is in a good position for further exploration of these issues.Lastly, ML bias correction was currently trained and verified only for the EAMv2 climate model.Applying the ML approach to correct other weather and climate models could be possible with retraining, but this has not been tested and verified.Future work will focus on addressing the limitations of our ML bias correction model as mentioned here and exploring the potential application of our ML architecture to other weather and climate models.We will report our findings in separate publications.
Appendix A: Supplementary Material for Section 2

A1. Additional Tables and Figures for Section 2.2 A2. Machine Learning Framework
The machine learning model for bias correction in Section 2.2 utilizes the same convolutional-LSTM hybrid neural network (NN) architecture described in Barthel Sorensen et al. (2024).The network takes as its input the snapshots of the entire horizontal discretization of all prognostic variables (i.e.U, V, T, Q) at a single sigma level of the EAMv2 model.Afterward, a custom "split" layer separates the input into non-overlapping subregions.Then, each subregion is independently passed through a series of convolutional layers tasked with extracting local flow features.Afterward, the local information extracted from each subregion is concatenated in a single vector via a custom "merge" layer and projected onto a reduced order latent space via a linear fully connected layer.This latent space representation is then fed through an LSTM layer before being projected back to physical space via another linear fully connected layer.In addition, global information is split into the same subregions of the input, and distributed to a series of independent deconvolution layers that upscale the data to the original resolution.Finally, a custom "merge" layer gathers the information from each subregion and produces the final corrected snapshot.The mathematical framework and algorithms for the machine learning operator are introduced below.
Consider a coarse discretization of a dynamical system, in this case the EAMv2 model, describing the evolution of the vector quantity v v = f (v). (A1) The high fidelity reference solution, in this case ERA5 data, is represented by u.The objective of the ML framework we employ is to capture the long term statistics of u by solving the imperfect model (Equation A1) and then applying a correction operator, G, to that computed solution.
An ML model naively trained a pair of arbitrary trajectories (v, u) is unlikely to generalize as it will be corrupted by the effects of chaotic divergence.Chaotic divergence is the inherent property of all turbulent systems that any two trajectories, which may initially be arbitrarily close, will eventually diverge-making a mapping between them meaningless.To minimize this effect the correction operator, G is trained not on an arbitrary pair of trajectories but specifically on the pair (v τ , u) where v τ is the solution to the coarse model nudged toward the reference data, Here P is an operator which projects the reference solution onto the coarse grid.The constant τ is a user-defined parameter that represents the timescale over which the nudging tendency acts.While this value is chosen such that the nudging term is smaller than all others, it still creates small discrepancies between the spectra of the nudged solution, v τ , and the free coarse solution, v.
If left unaddressed this discrepancy will hinder the ability of the machinelearned map G to generalize to free-running data.To remedy this issue the spectrum of the nudged trajectory, v τ is rescaled to match the spectrum of the free-running coarse model.Specifically, let qk = F[q] be the spatial Fourier transform of an arbitrary field q.The spectral energy is then defined as and the energy-ratio between v and v τ is defined as The spectrum-matched nudged solution is then defined as the inverse Fourier transform of the spectrally rescaled nudged solution: Training the correction operator then reduces to a supervised learning problem with an objective function After the correction operator, G, is trained on the spectrally corrected nudged data, during testing it is applied to the free run coarse model trajectory v(t).The resulting corrected trajectory constitutes our ML prediction and is Note.The monthly mean model output of "tos" (SST) and "siconc" (SIC) from the "r1i1p1f1" experiment conducted for "SSP24-5" and "SSP58-5" scenarios were extracted and used.
then used to compute statistics and other properties of interest.We refer the interested reader to Barthel Sorensen et al. (2024) for a more detailed discussion of the mathematical framework and network architecture.

Appendix B: Additional Notes for Section 2.4 and Section 4
Section 4 discussed three high-impact weather systems, including atmospheric rivers (ARs), ETCs, and tropical cyclones (TCs).Table B1 documents the feature tracking information used to derive the metrics for discussions.
The TempestExtremes package (P. A. Ullrich et al., 2021) was employed for feature detection and tracking of these weather phenomena using the 6-hourly model output from EAMv2.Specifically, the TCs and ETCs were tracked with mean sea level pressure (PSL), while the ARs were tracked with two components of integrated water vapor flux (TUQ and TVQ).In this paper, the same algorithm and parameter setups described in P. A. Ullrich  • For TCs, our configuration for TempestExtreme identifies the features on the grid points that have both a PSL minimum and an upper-level warm core.Specifically, the candidate model grid points are first identified and tagged by minima in PSL and then eliminated if a more intense minimum exists within a great-circle distance (CGD) of 6°.The closed contour criteria are then applied, requiring an increase in SLP of at least 2 hPa within 5.5°GCD of the candidate node, and the average temperature over 200-and 500-hPa pressure levels must decrease by 0.6 K within 5.5°GCD of the node within 1°CGD of the candidate with maximum air temperature.Meanwhile, the maximum magnitude of the vector wind at 10 m altitude (estimated with the wind fields at the bottom model level of EAMv2) within 2°GCD of the candidate, and the surface height at the candidate point was also identified and output by TempestExtreme.All threshold values selected here for TC feature tracking are based on Table 1 in Zarzycki and Ullrich (2017).After identifying TC candidates on each time slice (i.e., 6 hr), the "stitching" step in the TempestExtremes (P. A. Ullrich et al., 2021) was further employed to ensure that the identified features can be sufficiently classified as tropical storms: the wind magnitude (derived from the wind fields at bottom model level) must be greater than 10 m s 1 for at least 10 time-slices, the latitude of the feature must be between 50°S and 50°N for at least 10 time-slices (i.e., 60 hours), the feature must exist at an elevation below 150 m for at least 10 time-slices, and the maximum distance between feature candidates are of 8.0°GCD.In addition, we only kept the TC candidates that persist for at least 24 hr with a maximum gap size of 6 hr for evaluation and analysis in Section 4. • For ETCs, our configuration for TempestExtreme identified the features on the grid points with PSL minima and the minimum PSL must be enclosed by a closed contour of 2 hPa within 5.5°GCD of the feature center.Unlike the TCs, ETCs do not have a unique warm core structure.Therefore, the above-mentioned TC warm core structure criteria defined with average temperature over 200-and 500-hPa pressure levels were applied to exclude the TCs from the tracked ETC candidates.During the "stitching" step, we further filter the tracked feature candidates with the following criteria: the feature must exist at an elevation below 1500 m for at least 8 time-slices (i.e., 48 hours), and the maximum distance between feature candidates is of 6.0°GCD.In addition, we only kept the ETC candidates that persist for at least 60 hr with a maximum gap size of 18 hr for evaluation and analysis in Section 4. • For ARs, our configuration for TempestExtreme first calculated the Laplacian of the IVT field using eight radial points at a 10°GCD from each candidate model grid point.Following McClenny et al. (2020), the feature of AR was then identified and tagged on the grid points where the Laplacian of the IVT field is below a fixed threshold of 4 × 10 4 kg m 2 s 1 rad 2 .Finally, we removed the features too near the Equator and those that are deemed too small using filtering criteria typically used for AR trackers (Shields et al., 2018): each blob must have a minimum area of 4 × 10 5 km 2 , and the latitude of each tagged grid point must be at least 15°.The grid points belonging to a tropical cyclone identified in the first item were also excluded from the ARs.
With the feature tracking algorithms described above, 6-hourly TC, ETC track forecasts with a similar format as National Hurricane Center's HURDAT2 database (Landsea & Franklin, 2013), and 6-hourly AR mask data on the 1.5°× 1.5°global lat-lon grid will be obtained for EAMv2 simulations in Table 1.These tracking results were used to derive occurrence frequency metrics for ARs (Figure C3, Figures 5 and 6), ETCs (Figure C5 and Figures 7  and 9) and TCs (Figure C6) and ARs as shown in Section 4 and Appendix A1.This was achieved by binning the 6hourly TCs and ETC track data into a selected bin (described in the captions of each figure), or directly calculating the counts with the AR mask data.In addition, the "NodeFileCompose" function built in TempestExtreme was employed for composite analysis and metrics for ETCs shown in Figures 8, 10, and 11.Specifically, the NodeFileCompose first projects the model fields (e.g., wind, temperature, etc.) onto the stereographic plane centered on the ETC track to form a storm-centered snapshot for each field at each time slice (6 hr).The composite of each field is then derived by averaging all 6-hourly snapshots generated at the first step.More detailed descriptions and verifications on the composite analysis method can be seen in P. A. Ullrich et al. (2021), and the specific rules for the composite analysis in our study are also described in the captions under Figures 8, 10, and 11.
In addition, as the ML bias correction was applied to U, V, T, and Q at each grid point and model level, the feature tracking quantities, including PSL and IVT in this study were diagnosed offline for simulations with and without ML bias correction listed in Table 1.Specifically, the PSL was diagnosed with the algorithm proposed by the European Center for Medium-Range Weather Forecasts (ECMWF, Trenberth et al., 1993): Here, P s and Z s are surface pressure and Geopotential height, respectively.T bot and P bot are air temperature and pressure at the bottom model level, respectively.These quantities are directly from the model output of simulations in Table 1.Γ 0 (=6.5e 3 K m 1 ) is the temperature lapse rate.R d and g denote the dry air gas constant and gravitational acceleration, respectively.
Moreover, the water vapor transport in the atmosphere consists of the eastward (TUQ) and northward (TVQ) components, which were derived directly with the U, V, and Q from the model output using: where dP is layer thickness, and the integral was computed from surface (i.e., P s ) to top model level (i.e., P t ) at ∼0.2 hPa.
Finally, the cyclone genesis potential index (GPI), and potential intensity (PI) shown in Figure 12 are defined using the large-scale vorticity, vertical wind shear, potential intensity, and humidity fields following (Camargo et al., 2007): where η is the absolute vorticity at 850 hPa in s 1 , H 700 is the relative humidity at 700 hPa in percent, V pot is the potential intensity (PI) computed with the method proposed by (Emanuel, 2000;Knutson et al., 2013).The unit of PI is in m s 1 .V shear is the magnitude of the vertical wind shear between 850 and 200 hPa in m s 1 .

Figure C2
. Probability Density Function (PDF) of monthly mean temperature (ΔT, unit: K, panel a) and humidity (ΔQ, unit: K, panel b) differences between present-day and pseudo global warming simulations conducted with EAMv2.Shown are the SSP245-CLIM (dashed blue line) and SSP585-CLIM (dashed red line) for EAMv2 simulations without ML bias correction, as well as ML (SSP245)-ML (CLIM) (solid blue line) and ML (SSP585)-ML (CLIM) (solid red lines) for EAMV2 simulations with ML bias correction.The monthly mean data from each simulation during the 1979-2014 period were used to derive the metrics.The detailed descriptions of simulations can be found in Table 1.

Data Availability Statement
The source code for EAMv2 ( E3SM Project, 2021) used for simulations in this study was obtained from the Energy Exascale Earth System Model project, sponsored by the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research.The TempestExtremes package used for feature tracking of extreme weather events was extracted from the Github at https://github.com/ClimateGlobalChange/tempestextremes(P.Ullrich, 2022), and the user guide for this package can be found at https://climate.ucdavis.edu/tempestextremes.php(P. A. Ullrich et al., 2021).The CMIP6 data used to derive the climate change perturbations of sea surface temperature (SST) and sea-ice concentration (SIC) are available at https://esgf-node.llnl.gov/projects/cmip6/ (Eyring et al., 2016;O'Neill et al., 2016).The ERA5 reanalysis data used for machine learning training and evaluation in this study are available at the Copernicus Climate Change Service (C3S) Climate Data Store via https://doi.org/10.24381/cds.bd0915c6(Hersbach et al., 2020).The scripts and post-processed data for the analyses in this study can be found on Zenodo at https://zenodo.org/doi/10.5281/zenodo.11053624(S.Zhang & Charalampopoulos, 2024).

Figure 1 .
Figure 1.Zonal and annual mean model biases in the zonal wind (U, unit: m s 1 , panels a-b), temperature (T, unit: K, panels d-e), and water vapor mixing ratio (Q, unit: g kg 1 , panels g-h) averaged from 1979 to 2014.Shown are the results from free-running (i.e., CLIM, first column) and ML bias-corrected (i.e., ML (CLIM), second column) EAMv2 simulations.The biases are derived by comparing annual mean quantities between EAMv2 simulations with ERA5 reanalysis.The dotted region indicates that the differences are significant at a 95% confidence level from a Student's t-test.The third column shows the standard deviation of U, T, and Q at each pressure level from the ERA5 reanalysis.The log-linear algorithm is used to interpolate the EAMv2 data on the hybrid sigma-pressure level to the pressure level for comparison with ERA5 reanalysis.Details of the simulation setups can be found in Table1.

Figure 2 .
Figure 2.First row: mean biases in selected physical quantities averaged over the globe (panel a) and mid-latitude region ([30-60N] and [30-60S] latitude bands, panel c) from EAMv2 simulations without (i.e., "CLIM" in red) and with (i.e.,"ML (CLIM)" in blue) ML bias correction, normalized by the observed value (i.e., ERA5 reanalysis); Second row: same as the first row, but for root-mean-square errors (RMSE) of anomaly patterns between EAMv2 simulations and observations, normalized by the root-mean-square (RMS) of the observed values.All metrics are calculated using the monthly mean model output and ERA5 reanalysis (i.e., observation) from 1979 to 2014.The y-axis shows the selected physical quantities, including surface pressure (PS, unit: hPa), sea level pressure (PSL, unit: hPa), zonal wind (U, unit: m/s) and temperature (T, unit: K) at bottom model level, 850-, 500-and 200-hPa pressure levels, as well as the specific humidity (Q, unit: g/kg) at 925-, 850-, 500-and 200-hPa pressure levels.The log-linear interpolation is used to regrid the EAM model output on the hybrid sigma-pressure level to pressure-level, and compared with ERA5 reanalysis on pressure levels.Details of the simulation setups can be found in Table1.

Figure 3 .
Figure 3. Differences of air temperature at 850-hPa (unit: K) between the present-day and pseudo global warming EAMv2 simulations averaged over the whole simulation period of 1979-2014.Shown are panels (a) SSP245-CLIM and (c) SSP585-CLIM for EAMv2 simulations without ML bias correction, as well as (b) ML (SSP245)-ML (CLIM) and (d) ML (SSP585)-ML (CLIM) for EAMv2 simulations with ML bias correction.The dotted regions indicate the differences are significant with a 95% confidence level.The SSP245 and SSP585 denote two PGW EAMv2 simulations with imposed climate change perturbations in sea surface temperature (SST) and sea-ice concentrations (SIC) derived from the CMIP6 historical simulations following SSP2-4.5 and SSP5-8.5 future scenarios, respectively.Detailed description on simulations can be found in Table1.

Figure 4 .
Figure 4. Long-term statistics for monthly mean air temperature (first column, unit: K) and specific humidity (second column, unit: g kg 1 ) at 850-hPa pressure levels over the whole global domain during the simulation period of 1979-2014.Shown is the comparison among ERA5 reanalysis (gray bars), uncorrected (dashed lines), and ML-corrected (solid lines) EAMv2 simulation for present-day (blue lines) and future climate scenarios (red lines).The future climate simulations with SSP2-4.5 (top row) and SSP5-8.5 (bottom row) perturbations are shown in the top and bottom rows, respectively.The detailed descriptions on simulations can be found in Table1.

Figure 5 .
Figure5.Top row: distribution of the vertically integrated horizontal water vapor transport (IVT, units kg m 1 s 1 ) from ERA5 reanalysis averaged over all identified AR events at each grid point during 1979-2014 (panel a), and the model biases in the EAMv2 simulations without (i.e., CLIM, panel b) and with (i.e., ML (CLIM), panel c) ML bias correction.The AR events and composite IVT are tracked with TempestExtremes using the 6-hourly TUQ and TVQ data from ERA5 reanalysis and EAMv2 simulations; Bottom row: same as the top row, but for the annual AR occurrence frequency (unit: %) in ERA5 reanalysis (panel d) and the mean biases in CLIM (panel e) and ML (CLIM) (panel f).The annual frequency of AR is defined as the percentage of the number of time steps (6 hr) a grid point was part of an AR, divided by the total number of 6-hr time steps in each year during 1979-2014.The dotted region in panels (b-c and e-f) indicates that the differences between EAMv2 simulation and ERA5 are significant at a 95% significance level.

Figure 6 .
Figure6.Changes of the vertically integrated horizontal water vapor transport (IVT, units kg m 1 s 1 , top row) and annual occurrence frequency of ARs (unit: %) in the EAMv2 simulations with imposed climate change perturbations in sea-surface temperature (SST) and sea-ice concentration (SIC) for the SSP2-4.5 scenarios.Shown are the differences of SSP245-CLIM (panels a and c) and ML (SSP245)-ML (CLIM) (panels b and d) for the EAMv2 simulations without and with ML bias correction, respectively.The composite of IVT is derived using all AR events tracked by TempestExtremes during the present-day or pseudo-global warming period of 1979-2014.The definitions of the annual frequency of AR are the same as in Figure5a.The dotted region in panels (b-c and e-f) indicates that the differences are significant at a 95% significance level.

Figure 8 .
Figure8.Composites of meteorological quantities centered on ETC storm center of all filtered storms with mean sea level pressure (PSL) less than or equal to 990-hPa in the ERA5 reanalysis (first column) and the differences between EAMv2 simulations and ERA5 reanalysis before (i.e., CLIM, second column) and after (i.e., ML (CLIM), third column) applying ML bias correction.The top row shows the composite of air temperature (contour, unit: K) and wind (vector, unit: m s 1 ) at 850-hPa pressure level for (a) ERA5 reanalysis, (b) CLIM-ERA5 and (c) ML (CLIM)-ERA5; the bottom row shows the integrated vapor transport (IVT, unit: g kg 1 ) for (d) ERA5 reanalysis, (e) CLIM-ERA5 and (f) ML (CLIM)-ERA5.All ETCs tracked with 6-hourly PSL fields during 1979-2014 are included in the composite by filtering out the storms with centered PSL >990-hPa.The white (panels a-c) and black (panels d-f) cross markers indicate the center of ETC storms.

Figure 9 .
Figure9.Responses (Δ) of annual ETC track densities over the North Hemisphere (NH) to the imposed climate change perturbations in sea-surface temperature (SST) and sea-ice concentration (SIC) for the SSP2-4.5 (top row) and the SSP5-8.5 scenarios (bottom row).Shown are the differences of SSP245-CLIM (panel a), SSP585-CLIM (panel c) for the EAMv2 simulations without ML bias correction, as well as ML (SSP245)-ML (CLIM) (panel b) and ML (SSP585)-ML (CLIM) (panel d) for the EAMv2 simulations and with ML bias correction.All ETCs with mean sea level pressure (PSL) less than or equal to 990-hPa are included for the metrics.The ETCs are tracked with 6-hourly PSL fields from EAMv2 simulations from 1979 to 2014.The annual track density is defined as the total number of time steps (6 hr) the ETCs passed over an 8°× 8°grid box per year.The dotted region indicates that the differences are significant at a 90% significance level.

Figure 10 .
Figure 10.Responses (Δ) of the composite mean sea level pressure (PSL, unit: hPa) centered on the ETC storm center of all filtered storms in EAMv2 future climate simulations following SSP2-4.5 (first row) and SSP5-8.5 (second row) emission scenarios.The Δs are derived by subtracting the composite PSL in SSP245 and SSP585 simulations from the CLIM simulations.All ETCs tracked with 6-hourly PSL fields during 1979-2014 are included in the composite by filtering out the storms with centered PSL >990-hPa.The white cross markers (panels a-d) indicate the center of ETC storms.

Figure 11 .
Figure 11.Same as Figure10but for responses (Δ) of temperature (blue contours, unit: K) and wind (black vectors) at 850-hPa pressure level as well as the vertically integrated vapor transport (IVT, unit: kg m 1 s 1 ) to climate change following SSP2-4.5 (first row) and SSP5-8.5 (second row) emission scenarios.See Table1for a detailed description of EAMv2 simulations.

Figure 12 .
Figure 12.First column: Seasonal mean tropical cyclone Genesis Potential Index (GPI, unitless, panel a), potential intensity (PI, unit: m s 1 , panel d), and vertical wind shear between 200 and 850 hPa (unit: m s 1 , panel g) from ERA5 reanalysis averaged from 1979 to 2014; Second column: bias in GPI (panel b), PI (panel e) and 200-850 hPa vertical wind shear (panel h) in the EAMv2 simulation without ML bias correction (i.e., CLIM-ERA5); Third column: the same as the second column but for the EAMv2 simulations with ML bias correction (i.e., ML (CLIM)-ERA5, panels c, f, i).The monthly mean model output from ERA5 reanalysis and EAMv2 simulations are used to calculate the GPI, PI, and vertical wind shear following Equation B7.For all panels, the seasonal mean values are computed for August to October in the Northern Hemisphere and for January to March in the Southern Hemisphere.Detailed description of EAMv2 simulations can be found in Table1.

Figure A1 .
Figure A1.Changes of annual mean sea surface temperature (SST, unit: K, panels a, c) and sea-ice concentration (SIC, unit: %, panels b, d) in response to the forcing pathways of SSP2-4.5 (top row) and SSP5-8.5 (bottom row) from CMIP6 coupled model simulations.Shown are the multi-model ensemble mean climatological differences averaged over the 15 models listed in TableA1.The climatological differences are computed with the output from coupled historical simulations during 1991-2010 and future climate simulations during 2041-2060.More detailed descriptions on the simulations and models in TableA1can be found inEyring et al. (2016) andO'Neill et al. (2016).

Figure C3 .
Figure C3.Same as Figure6, but for the pseudo global warming simulations conducted with EAMv2 using imposed climate change perturbations in SST and SIC derived from SSP5-8.5 future scenarios.

Figure C4 .
Figure C4.Horizontal distribution of mean sea level pressure averaged from 1979 to 2014 from ERA5 reanalysis (panel a), and mean model biases in EAMv2 free-running simulations without (i.eCLIM, panel b) and with (i.e., ML (CLIM), panel c) ML bias correction.The dotted regions in panels (b, c) indicate the differences between the model and ERA5 reanalysis are significant at a 95% significance level.

Figure C5 .
Figure C5.Same as Figure 9 but for the responses of ETC track density to imposed climate change perturbations in sea surface temperature (SST) and sea-ice concentration (SIC) from SSP2-4.5 (top row) and SSP5-8.5 (bottom row) over Southern Hemisphere.

Figure C6 .
Figure C6.Top row: track density maps for tropical cyclones (TCs) tracked in ERA5 reanalysis (panel a) and EAMv2 free-running simulations without (i.e., CLIM, panel b) and with (i.e., ML (CLIM), panel c) ML bias correction.The 6-hourly sea level pressure (PSL) data from 1979 to 2014 are used to track the TC-like vortices at each model grid using the TempestExtremes.The TC track density is defined as the average number of 6-hourly TC track locations within a 4°× 4°grid box per year.Bottom row: Climatological mean distribution of the TC numbers fall into the Saffir-Simpson wind scale (d) and the normalized probability distribution function (PDF) of the 10-m maximum wind speed (e) in IBTrACS observations (gray colored bars) and EAMv2 simulations without (brown colored bars and lines) and with (green colored bars and lines) ML bias correction.The x-axis in panel (d) corresponds to the Saffir-Simpson wind scale: TS, tropical storm (17.5-32 m s 1 ); Cat1, Category 1 (33-42 m s 1 ); Cat2, Category 2 (43-49 m s 1 ); Cat3, Category 3 (50-58 m s 1 ); Cat4, Category 4 (59-69 m s 1 ); Cat5, Category 5 (>69 m s 1 ).The IBTrACS refers to the International Best Track Archive for Climate Stewardship, which contains the TC track and intensity data in historical observations.The statistics in panels (d-e) are obtained with 6-hourly data from 1979 to 2014.

Table 1
List of Simulations for Present-Day and Future Climate Scenarios

Table A1
List of CMIP6 Models Used to Derive the Imposed Climate Change Perturbations in Sea Surface Temperature (SST) and Sea Ice Concentrations (SIC) for Psedo Global Warming Simulations in Table 1

Table B1
List of Extreme Weather Events and Model Variables Used for Feature Tracking by the TempestExtremes Package Note.The EAM model output used to derive the feature quantities is listed in the fourth column.See context in Appendix C for details.et al. (2021) were used by TempestExtremes for feature tracking of each extreme weather event in EAMv2 simulations: