Extreme hot days over three global mega‐regions: Historical fidelity and future projection

Using a downscaled high‐resolution NASA Earth Exchange Global Daily Downscaled Projections (NEX‐GDDP) dataset based on Coupled Model Intercomparison Project Phase 5 (CMIP5) simulations, this study evaluated and compared extreme hot days (EHDs) over the three mega‐regions [the Eastern United States (EUS), Europe (EU) and Eastern Asia (EA)] during the historical period (1981–2005) against observations, resulting in a subset of models with high skill for the past climatology and trend. The observed EHDs over EU exhibit the largest absolute amount and the most significant increases in frequency (4.0 days·decade−1), intensity (0.22°C·decade−1) and extent (8.4°C·days·ecade−1), while no significant trend over EUS is found. Compared with the observation, the largest bias in NEX‐GDDP is the remarkably overestimated increase in the trend over EUS. In the RCP8.5 projection using six models with relatively high fidelity, the increase of EHDs is evidently enhanced during 2030–2054 over the three regions, particularly EUS. The projected trend of EHDs over EUS could be undetermined because of the modelling biases in aerosol effects and internal variation, which is worthy of further investigation in CMIP6.


| INTRODUCTION
Extreme hot days (EHDs) cause severe threats to human health, infrastructure, agriculture and other socio-economic systems (Easterling et al., 2000;Perkins, 2015), and the associated damages are usually aggravated over certain mega-regions with rapid economic growth and high population density (IPCC, 2014). Under global warming, both heatwaves and warm spells are being reported with increasing frequency, intensity and duration worldwide (e.g. Perkins et al., 2012;Fischer and Knutti, 2015;Gao et al., 2019), and the area affected by EHDs has also remarkably increased in recent decades (Russo et al., 2014;Christidis et al., 2015). Therefore, reliable future projection of EHDs is crucial for regional and global disaster mitigation and prevention.
Using global climate models (GCMs), several studies have attempted to project future change in extreme high temperature over certain regions using different extreme indicators. According to 31 CMIP5 GCMs, the overall frequency and severity of extreme heat events throughout the United States will increase in the future, and the associated spatial distributions are different, depending on the metrics (absolute or relative) used (Wobus et al., 2018). Using the 90th percentile of a 21-day-long moving window over 1981-2005, four GCMs from CMIP5 show increasing frequency, duration, intensity and severity of EHDs in the coming decades under RCP8.5 over China . Evident increases in the duration, intensity and frequency were also projected by CMIP5 models for EHDs over western Europe based on a percentile-based EHD definition (Schoetter et al., 2015). Most previous studies, such as those mentioned above, have focused on individual regional projection. One exception is the work by Meehl and Tebaldi (2004), who compared the future behaviours of heatwaves between Europe and North America using a single GCM.
Regional climate models (RCMs) have also been applied to project EHDs, with the advantage of finer spatial resolution (e.g., Lhotka et al., 2018). Fischer and Schär (2010) reported that health-associated heatwaves measured by daily temperature and health-related heatwave index will become more severe over Southern Europe, along the Mediterranean coasts and in urban centres, based on a high-resolution (25 km) RCM. However, future projection of extreme high temperature with regional model projection strongly depends on the input of different boundary conditions, and thus carries large uncertainties (Zobel et al., 2017).
Previous studies have projected the future change in EHDs using either coarse GCMs or fine-resolution RCMs for certain individual regions. However, parallel comparisons among individual important regions based on a unified criterion using fine-resolution data based on multiple GCMs that demonstrate high skill in simulating past climate, are rare in the literature. Such a comparison is important for communication and cooperation with respect to climate across multiple regions. A high-resolution (0.25 × 0.25 ) data set named NASA Earth Exchange Global Daily Downscaled Projections (NEX-GDDP) downscaled from CMIP5 output has been released (Thrasher et al., 2012(Thrasher et al., , 2013, which provides us a new opportunity to fill the above knowledge gap. The Eastern United States (EUS), Europe (EU), and Eastern Asia (EA) are three extratropical mega-regions in the Northern Hemisphere at roughly similar latitudes . All are regional centres for both politics and economics, include several super cities and have large population densities (Florida et al., 2008). Under global warming, these three mega-regions have relatively higher vulnerability to extreme climate (Zhang and Zhou, 2020), thus, given all the above, are the target regions of this study. More specifically, this study evaluates and compares these megaregions' historical features of EHDs in NEX-GDDP using unified EHD criteria. Furthermore, the models with best performances over the three target regions were selected, and the future changes in EHDs using NEX-GDDP data derived from these selected models were projected. Based on the unified criteria, this study aims to address the following issues: (a) the historical characteristics of EHDs over the three mega-regions and their regional comparisons in observations; (b) the fidelity of EHD features in the NEX-GDDP data over the three mega-regions; and (c) the projection of future EHDs over the three mega-regions based on the models and associated uncertainty.

| Data
Two main observational datasets were applied in this study. To identify the observed historical EHD features, the daily maximum temperature (T max ) data, with a horizontal resolution of 1 spanning 1951-2005, was retrieved from Berkeley's Earth Surface Temperature (BEST) database (Rohde et al., 2013a;2013b). We also examined the results after interpolating the BEST dataset to a resolution of 0.25 × 0.25 , similar to that of NEX-GDDP, which did not reveal any great difference in the results, as shown in Figure S1. To choose mega-regions with high-level human activities, the DMSP-OLS4 2013 nighttime stable light dataset was retrieved from the Earth Observation Program of NOAA's National Geophysical Data Centre (available from https://www.ngdc.noaa.gov/ eog/), which has a 0.1 horizontal resolution and covers the region from 0 to 75 N. This dataset measures artificial light from cities, towns, and other continuous lighting areas at night, in the form of a digital number ranging from 0 to 63. This light data can indicate the population density of an area and have been extensively used in socio-economic research (Chen and Nordhaus, 2011), urban extent mapping (Zhou et al., 2015), and assess the distribution of energy consumption (Jin et al., 2019) and population (Zhuo et al., 2009). Here, we used it to choose mega-regions with high-level human activities by identifying areas exceeding the 95th percentile of digital number values in the Northern Hemisphere.
The NEX-GDDP dataset was originally retrieved from a subset of GCMs that participated in CMIP5 (Thrasher et al., 2012). In the NEX-GDDP dataset, 20 CMIP5 GCMs were downscaled to a spatial resolution of 0.25 × 0.25 by using the bias-correction and spatial disaggregation method (Wood et al., 2004;Maurer and Hidalgo, 2008) based on the r1i1p1 output of each model. We specifically chose those models that can realistically simulate EHDs in the historical run  and provide two projections under the RCP4.5 and RCP8.5 emission scenarios for 2006-2099 (Meinshausen et al., 2011). We also intentionally obtained CMIP5 models with a horizontal resolution ranging from 0.94 × 1.25 to 2.79 × 2.85 , as shown in Table S1, for comparison.

| Methods
An EHD refers to a day with T max exceeding a threshold or the hottest day in 1 month or 1 year (Perkins, 2015;Frölicher et al., 2018). Since this study aims to compare the characteristics of EHDs among different regions, a regional percentile-based approach was adopted. A day with T max exceeding the 90th percentile centred on a 5day window for the base period of 1961-1990 in each grid was counted as one EHD, which is a widely used approach in many studies (e.g. Karl et al., 1999;Peterson et al., 2001). This study focuses on EHDs in boreal summer [June-August (JJA)].
To quantitatively compare the EHD features among the mega-regions, three indicesfrequency, intensity and extentwere applied, respectively, in the observation, historical simulations and future projections. The frequency (in days) is defined as the number of EHDs; the intensity (in C) is equal to the T max anomaly against the 90th percentile threshold averaged in EHDs in each summer; and the extent (in C·days) describes the accumulated intensity in all EHDs calculated by the frequency multiplied by the intensity in each summer.
Using these three indices, this study focuses on evaluating two aspects of EHD features in the observation and NEX-GDDP data: the domain-averaged climatology and linear trend of EHDs. The trends of EHD features were estimated by the ordinary least-squares technique, and its statistical significance was evaluated using the Student's t test. The frequency, intensity and extent of EHDs were first calculated for each grid and then averaged over the land areas of the three mega-regions. The data were weighted via the cosine of the latitude when calculating the domain average.
To reduce the bias in the future projection and narrow the model-based uncertainty, we specifically selected the "best" models as optimum models according to the historical evaluation. We evaluated the fidelity of EHD features in 20 NEX-GDDP models during 1981-2005 based on the following two metrics. One, for the mean state, we refer to as the mean state fidelity (MSF); and the other, for the trend, is called the trend fidelity (TF). Their definitions are as follows: where x i and x o are the simulated and observed climatology, respectively, for model i and the observation during the given period: where y i and y o are the simulated and observed trends, respectively, for model i during the given period. In addition, a trend ratio was calculated by dividing the EHD trend in the future projection (2030-2054) by that in the historical run .

| Observed climatology and change in EHDs over the mega-regions
Based on the descriptions of the three mega-regions in Section 1, the night-time stable lighting levels in 2013 were used to obtain the spatial distribution of the intensity of human activity, and the 95th percentile in the Northern Hemisphere was the threshold for selecting the mega-regions (Figure 1). According to this criterion, the three domains of the EUS (100 -70 W, 30 -45 N), EU (10 W-25 E, 36 -58 N) and EA (110 -140 E, 30 -41 N) were chosen as the target areas.
Using BEST dataset, three indicators were applied to identify the characteristics of EHDs over the three  (during 1981-2005) in 20 and 6 optimal models. The asterisk symbols '**' and '*' indicate the linear trends are significant at the 99% and 95% confidence levels, respectively mega-regions. The frequency, intensity and extent have been defined in Section 2, and here their domain-averaged and seasonal climatology are analysed. Figure 2a demonstrates the climatology of EHD features during boreal summer (JJA) over the three targeted regions. We specifically examined the two periods of 1951-2005 and 1981-2005 because the phenomenon of global warming became significant after 1981 (Watanabe et al., 2014;IPCC, 2018).
As shown in Figure 2a, an almost identical frequency of EHDs occurs over the three regions, with a domain average of 10 days in each summer of 1951-2005, whereas during 1981-2005 the EHD frequency over EU is slightly higher (by 2-3 days) than that in the other two mega-regions. In terms of the intensity, the EHDs over EU exhibit much more severe features than in the other two mega-regions. The intensity (1.11/1. 26 C during 1951-2005/1981-2005) over EU is much higher than those (0.79/0.80 C in EUS; 0.75/0.78 C in EA) of the other two mega-regions. For their cumulative impact, the extent of EHDs over EU is nearly twice that of EUS andEA in 1981-2005.
The trends of the three indicators during the two periods over the three mega-regions are also shown in Fig   is also significant, being comparable with that over the EU region in the same period. However, the EA region has no significant trend during 1951-2005, which means that the increase in the trend of EHDs became significant after 1981 over EA.
As a summary of the historical features over the mega-regions, the EU region exhibits the largest absolute amount and the most significant increase in EHD frequency, intensity and extent among the three megaregions. The EHD indices over EUS show no significant trend, and the increase in the trend is evidently enhanced after 1981 over EU and EA, especially the latter.

| Climatology and change in EHDs in NEX-GDDP
To evaluate if the NEX-GDDP dataset can reproduce the above-mentioned historical characteristics and regional differences in EHDs over the three mega-regions, we examined the NEX-GDDP multi-model ensemble (MME) results in terms of the three indicators for both the climatology and the trend fidelity, as shown in Figure 2b. The MSFs for the three indicators show that the observed climatologies over the three regions are fairly well captured by NEX-GDDP, and the EA region exhibits the most accurate representation in terms of the MME. The amounts of the three indicators are all overestimated over EUS/EA but underestimated over EU, particularly for the EHD extent.
For the TFs over the three mega-regions, the EUS region shows no trends in the observation but has a significant false increase in the trends in the NEX-GDDP MME. Meanwhile, the increase in the trends over EU and EA are more realistically captured but underestimated. Therefore, the maximum bias in terms of the trends is the remarkably overestimated increase in the trend over EUS. To reduce the errors in the EHDs over the three mega-regions, we selected optimum models by identifying those that can capture the historical features most realistically, meaning their future projections could also be considered as the most reliable.

| Selection of models for future projection
To select optimum models for the future projection of EHDs over the mega-regions, we applied two common basic criteria based on the above three indices. The optimum models should be able to realistically reproduce the observed historical climatology and trend of EHDs simultaneously over the three domains for the three indices. Specifically, if the biases of the mean state over each region by a specific model were within a quartile/tertile/ median for frequency/intensity/extent and, at the same time, the trend biases were within 3 (days·decade −1 )/0.15 ( C·decade −1 )/6 ( C·days·decade −1 ) for frequency/intensity/extent compared to the BEST observations, this model was chosen as an optimum model for the future projection. According to the above two criteria, MSFs should range from 1.25/1.33/1.5 to 0.75/0.67/0.5 for frequency/intensity/extent, and the absolute values of TFs should be below 3/0.6/6 for frequency/intensity/extent. The estimated MSFs and TFs for all the models considered are displayed in Figure 3. The models that are located within the dashed green box meet the above two criteria. For frequency, eight models met the criteria, while seven models were selected for intensity and nine for extent. Considering all the three regions and three indices, six common optimum models (BCC-CSM1-1, CanESM2, GFDL-ESM2M, MIROC-ESM, MPI-ESM-MR and NorESM1-M) were ultimately selected for further analysis.
Using these six optimum models, we re-evaluated the fidelity of the EHD features in the historical simulations over the three mega-regions that are shown in Figure 2b. The most notable improvement is that the biases in the trend of EHDs over EUS have been slightly reduced from 2.59 to 1.36 (days·decade −1 ) in frequency, from 0.13 to 0.07 ( C·decade −1 ) in intensity and from 4.80 to 2.02 ( C·days·decade −1 ) in extent, although the overestimated increased biases of the three EHD features still exists over EUS. Meanwhile, the MSF range bar, which represents the uncertainty of the result, is reduced from 0.67 to 0.22 in frequency over EUS during the historical period. In terms of TF, the frequency biases also drop, from 5.4 to 2.4 (days·decade −1 ) over EUS, from 6.7 to 2.1 (days·decade −1 ) over EU, and from 8.1 to 2.8 (days·decade −1 ) over EA. Such a significant reduction in uncertainty can also clearly be seen in both the intensity and extent of EHDs (Figure 2b). Figure 2b also shows that the choice of GCMs for representing EHDs is more important for EUS and EU than for EA, as the differences between the ensemble means from the six models versus the 20 models are relatively small for the three indicators in EA.

| Projection of EHD characteristics using the optimum models
Here, we focus on two aspects of the EHD projections over the three mega-regions. The first is if the trends of EHDs over each mega-region change for the three indicators under two future scenarios; and the second is if the future trend is higher than the historical trend (measured by the trend ratio defined in Section 2.2). The projection in the three mega-regions. The boxes with green dashed lines show the boundaries of bias in mean state within 0.25/0.33/0.5 and trend within 3 (days·decade −1 )/0.15 ( C·decade −1 )/6 ( C·days·decade −1 ), respectively. The red circles indicate the optimal models selected to perform the "best" in the three regions simultaneously. The red '×' indicates the fidelity of the MME outputs from both the RCP4.5 and RCP8.5 scenarios were used. The Paris Agreement set a preferable target to limit global warming to 1.5 C above preindustrial levels (United Nations Framework Convention on Climate Change, 1992), and many studies have projected extreme events in models for this 1.5 C warming level Zhao and Zhou, 2019). Thus, for the comparison with the historical epoch , the period of 2030-2054 was chosen as the future epoch for the following analysis, during which global warming is likely to reach the 1.5 C level (IPCC SR15, 2018;Nangombe et al., 2019). Figure 4 shows the trend ratios for the three regions and the three indicators under the RCP4.5 and RCP8.5 scenarios. Relatively speaking, EU and EA show smaller changes compared to EUS under the two scenarios. The trend ratios under RCP8.5 are systematically higher than those under RCP4.5, but the regional differences under the two scenarios remain practically unchanged. The increases in the trends of the three indicators are detailed in Table S2.

| Discussion
Compared to the observation, the most remarkable uncertainty in most CMIP5 models of NEX-GDDP is the evidently overestimated increase in the EHD trends over EUS ( Figure 2b). As this study shows in Figure 2a, an insignificant warming, and even a cooling trend, over EUS is reported in the observation, which has been referred to as a 'warming hole' in several previous studies (e.g., Perkins, 2015;Yu et al., 2014). This 'warming hole' might be attributable to two factors. One is the increased anthropogenic aerosol concentration during , which caused decreased surface radiation and an increased cloud fraction (Portmann et al., 2009;Leibensperger et al., 2012). The other can be ascribed to the natural internal variation that decelerates the warming trend (Kunkel et al., 2006;Weaver, 2013). Figure S3 shows the long-term trend of EHDs over EUS is anti-correlated with the Interdecadal Pacific Oscillation (IPO), which is not the case over the other two mega-regions. The close relationship between the change in EHDs over EUS and the IPO indicates that the insignificant increase in EHDs over EUS could be the footprint of the IPO, which needs to be studied in depth in the future. Compared with the observation, the internal variation (e.g. the IPO) cannot be captured well by current MME modelling (Sheffield et al., 2013). Therefore, the weakened warming trend over EUS is overestimated in models. Additionally, through comparing the epochal change of the optimum models with other models in this study, the multi-model domain-averaged cloud fraction in the six best models is evidently larger than that in the six worst models ( Figure S2), which indicates that the current models can underestimate the cloud fraction such that they also falsely increase the downward solar radiation and cause an unrealistic warming trend.
NEX-GDDP is a statistical downscaling dataset based on the simulations of the GCMs from CMIP5 and historical observation. Compared with the original GCMs' outputs, the most evident improvement of NEX-GDDP is the reduced biases for both the climatological intensity and extent of EHDs over EUS and EA (blue star is closer to 1.0 along the x-axis in Figure 5b). Meanwhile, the uncertainties (boxes in Figure 5) among different models are also evidently reduced for the absolute intensity of EHDs over the three mega-regions, particularly over EUS and EA. However, the biases for the climatological intensity and extent over EU become larger in NEX-GDDP. At the same time, both the absolute biases and the uncertainty band for the EHD frequency in NEX-GDDP are not reduced for the three mega-regions (Figure 5a). In term of trends, NEX-GDDP does not have higher fidelity than the original GCMs for all three indicators over the three regions, as shown along the y-axis in Figure 5. The above comparison indicates that the downscaled NEX-GDDP performs better than CMIP5 models for the MSF of EHD intensity and extent, while the trends of EHDs have not been improved over the three mega-regions. As mentioned above, the observed regional trend is influenced by both anthropogenic aerosol and natural internal F I G U R E 4 Ratios of the projected EHD trends under the RCP4.5 (red) and RCP8.5 (golden) scenarios  to those of the historical run  in the six optimal models for the three mega-regions variation, which may not be improved by statistical downscaling. Additionally, the deficiencies of statistical downscaling methods could also contribute to the failure in improving the representation of trends (e.g. Li et al., 2018).

| Conclusions
A newly released high-resolution dataset, NEX-GDDP, based on CMIP5 model simulations and statistical downscaling, provides an opportunity to compare the EHDs among the three mega-regions (EUS, EU and EA) in historical simulations and future projections using unified indices (frequency, intensity and extent) of EHDs. Firstly, the observed historical features over the three megaregions are presented and compared. Observationally, the EU region exhibits the largest amount and the most significant increase in the frequency, intensity and extent of EHDs among the three mega-regions, while the EHDs over EUS show no significant trend. Compared with the observed historical features, the maximum biases in NEX-GDDP are found in the remarkably overestimated increase in the trend over EUS, while other major features are realistically simulated and downscaled. To reduce the biases and increase the robustness of future projection, we selected six GCMs as optimum models. The ensemble means of the six models have been proven to evidently reduce the original biases from all multimodel results, although the overestimated increase in the trend over EUS still exists.
In the future projections, the future increase in the trend of EHD intensity under RCP4.5 is smaller than that of the historical period over EU and EA. In contrast, under RCP8.5, the future increase in the trends of EHDs over all the mega-regions are evidently enhanced (by two to three times), particularly over EUS, compared with the historical simulation. However, the projected increase in the trend over EUS could be uncertain because of modelling biases in aerosol effects and internal variation, which should be further investigated as part of the ongoing CMIP6 activity.
F I G U R E 5 The MSF and TF of summer EHD (a) frequency, (b) intensity and (c) extent derived from 20 NEX-GDDP (blue dots) and CMIP5 (red dots) models during 1981-2005 in the three mega-regions. The blue and red stars in (a-c) denote the MME frequency, intensity and extent of the NEX-GDDP and CMIP5 models, respectively. The boxes with green and orange lines show the boundaries of bias among the 20 NEX-GDDP and CMIP5 models, respectively