Substantial changes in the probability of future annual temperature extremes

Extreme temperature events causing significant environmental and humanitarian impacts are expected to increase in frequency and magnitude due to global warming. The latest generation of climate model projections, Coupled Model Intercomparison Project Phase Six (CMIP6), provides a new and improved database to investigate change in future daily scale extreme temperature events. This study examines the changes in 1, 3, and 5 day averaged annual maximum temperature in four large CMIP6 ensembles. It analyses, using a generalized extreme value (GEV) method, the change in extreme daily mean temperatures at 1.5 and 2°C of global warming, levels highlighted by the 2016 Paris Agreement, and additionally at 3°C. Extremely hot events are characterized using the annual maxima of daily near surface air temperature in the SSP370 scenario. Global changes in the mode of the distributions (location parameter) follow long‐term summer warming and show very similar spatial patterns. Changes in variability (scale parameter) show a clear trend of increases over the tropics and decreases over higher latitudes, while changes to the tails of distributions (shape parameter) show less globally consistent trends but clear signals over the Arctic sea ice, behaviour also seen in variability. Risk ratios (RRs) indicating the change in probability of hot daily extremes that currently have a 10 year return period increase globally with mean temperature change, with greater increases over the tropics. Globally averaged changes in RR over land range from 3.1–3.6 to 7.9–8.3 for 1.5 and 3°C of warming, respectively. For the latter case, this indicates previously rare, once‐in‐a‐decade summer extremes will occur almost annually in the future under high warming.


| INTRODUCTION
Anthropogenic warming has been shown to enhance heat extremes, with recent events such as the 2019 record heat in France being, to a substantial fraction, attributed to the effects of human driven emissions (Vautard et al., 2020). These extreme heat events have wide environmental, economic and humanitarian impacts with the resulting temperatures increasing mortality rates (D'Ippoliti et al., 2010) whilst also affecting infrastructure and causing economic loss (Lass et al., 2011). As the intensity and frequency of extremely hot events is expected to increase further with global warming (Kharin et al., 2018;Seneviratne et al., 2018), the ability to understand potential changes in the occurrence of these events is vital to guiding future mitigation and adaption strategies. The present study examines projected changes in annual maximum temperature extremes at the target levels set out in the Paris agreement. It also investigates the potential effects of an additional degree of warming, since there is still a significant disparity between target and actual greenhouse gas emissions (Peters et al., 2020), suggesting the upper limit of the Paris agreement may be overshot.
Using output from Phase 6 of the Coupled Model Intercomparison Project (CMIP6), we explore changes in annual maximum temperatures, characterizing them using the generalized extreme value (GEV) distribution. The latter is suitable to estimate changes in the probability of rare extreme events such as high maxima (Zwiers and Kharin, 1998;Wehner, 2005), and aims to provide robust statistics for block maxima that enable calculating return values for rare events. Studies tend to find large changes in the return value of anomalously high temperatures in the future, yet in most cases, the change is dominated by changes in the mean of the future distribution. Some studies show a tendency for widening of the future distribution, for example, in Central Europe (Hegerl et al., 2004;Schär et al., 2004;Chan et al., 2020). Here we investigate if future simulations indicate systematic and robust changes in the shape or tail width of the distribution of daily extreme temperature. Furthermore, we analyse how extremely hot daily averaged events with return periods of 10 years at the nearpresent time are expected to change at 1.5, 2, and 3 C of global warming relative to the pre-industrial period.

| Climate model data
This study is based on daily near surface air temperature (tas) output from CMIP6, specifically the CMIP6 historical simulations and the future projections from the ScenarioMIP simulations (O'Neill et al., 2016). The last 10 years of the historical simulations were used to define a reference decade of 2005-2014 inclusive, during which CMIP6 historical forcing is applied. ScenarioMIP data sets extend from 2015, typically to 2100. This study used the SSP370 scenario simulations to determine the decades in which each model exceeds 1.5, 2, and 3 C of warming since pre-industrial. This scenario was chosen since it provides the largest number of ensemble members and hence is best suited to sample extreme daily scale heat events. SSP370 is a minimally mitigated scenario, with an increase in radiative forcing to 7.0 WÁm À2 by the end of the 21st century, driven by domestic and regionally driven policy. Several models have at least nine individual ensemble members in the SSP370 scenario (the project's minimum requirement for a large ensemble; O'Neill et al., 2016), and all climate models chosen reach all three warming thresholds. The choice of a particular emissions pathway was not expected to show a strong impact on results determined at the chosen levels of warming, as the focus of this investigation was the climatology of extremely hot daily temperatures at future levels of warming and not the rate of warming (Seneviratne et al., 2016;Wartenburger et al., 2017).
Four models with large ensembles were selected from the available data in the Centre for Environmental Data Analysis (CEDA) archive at the time of analysis (June 2020). The chosen models were the Canadian Centre for Climate Modelling and Analysis Canadian Earth System Model Version 5 (CanESM5) (Swart et al., 2019), the Institut Pierre Simon Laplace CM6A-LR (IPSL) , and the low and high resolved Max Planck Institute Earth System Models MPI-ESM1.2-LR (MPI-LR) and MPI-ESM1.2-HR (MPI-HR) (Müller et al., 2018 for an examination of differences between the two models). Table S1 shows a breakdown of respective sub-experiment ensemble members and native grid sizes. CanESM5 provided the largest ensemble of 50 members and the MPI models the smallest of 10. The IPSL, MPI-LR, and MPI-HR models were regridded before analysis using bilinear interpolation to a 128 Â 64 grid to match that of the lowest resolution model, CanESM5.
2.2 | Exceedance decades for 1.5, 2, and 3 C To determine when global warming levels of 1.5, 2, and 3 C are reached, future decadally averaged global mean temperatures from each model were compared to a reference period at which the estimated anthropogenic warming is well quantified. The use of this reference period allows us to combine the observationally constrained warming since the pre-industrial baseline with future warming, avoiding uncertainty due to the models' different warming history over the instrumental period. The IPCC Special Report on 1.5 C warming established that since 1850-1900, the global mean surface temperature has risen by a best estimate of 0.87 C to the period 2006-2015(IPCC, 2018. This is 1 year offset from the final CMIP6 historical period 2005-2014; however, the difference in mean surface temperature between the two decadal periods was found to be very small in the observed HadCRUT4 (Morice et al., 2012) gridded surface temperature data set and we considered this uncertainty negligible.
To identify the exceedance periods, the global mean surface temperature was first calculated for the reference period in the ensemble mean of each model. The annual mean near surface air temperature was then calculated from monthly data for the period 2015-2100 in the SSP370 scenario and a 120 month averaging filter applied. By subtracting the 10 year averaged mean for the reference period, the temperature anomaly relative to this period was obtained, and with the addition of the 0.87 C of known warming, the anomaly relative to pre-industrial found. The exceedance period was defined as the first decade in which the mean surface temperature anomaly exceeded the respective level and subsequently remained above it.
We averaged across all ensemble members to identify the decade in which each individual model reaches each level of warming (Table S2). The ensemble average allowed us to consider the exceedance due to the forced component of the warming signal. With this methodology, individual model extreme temperature changes are more directly comparable as they all consider the same magnitude of warming, irrespective of when it happens.

| Change in the intensity of extreme daily temperatures
Having identified the decades in which global warming exceeds 1.5, 2, and 3 C on average, the difference in the F I G U R E 1 Change in location parameter μ for the annual maximum daily averaged temperature index Tx1day at 1.5, 2, and 3 C of global warming for each model, shown relative to the near-present period 2005-2014. Hatching indicates areas in which there is non-robust change, where the mean change is less than the SD of bootstrapped values and therefore less than the internal model variability.
Land masses see greater warming than oceans, whilst some regions of the Arctic see very significant warming. Some localized cooling is seen at lower warming levels at higher latitudes distribution of annual maximum temperatures was evaluated between periods and climate models.
In this study, we focused on change in the intensity of the annual maximum of 1, 3, and 5 day averaged daily temperature, Tx1day, Tx3day, and Tx5day, respectively. Extreme value theory can be used to characterize the distribution of maximum values and has previously been applied to analysis of extreme weather and climate events such as precipitation and temperature extremes (Kharin et al., 2018). The GEV distribution (Coles, 2001) is fitted to block maxima, here the annual maximum (Zhang et al., 2011) of 1, 3, and 5 day averaged maximum temperatures. The GEV distribution is characterized by three parameters, μ the location, σ the scale, and ξ the shape. μ represents the mode of the distribution, σ the variability, and ξ the behaviour of the distribution tail. A large positive ξ results in a heavy tailed distribution while a negative value of ξ results in a distribution with a shorter tail. Changes in distribution of these parameters between the reference period and the exceedance decades were used to characterize how the most likely values of annual maxima, their spread, and rare extremes can be expected to change.
In the reference period and each exceedance decade, the maximum annual temperature indices were calculated for each grid cell, for every ensemble member of each model. As we investigated climatological decades, for each model, the number of resulting indices was equivalent to 10 times the number of ensemble members, which gave sample sizes ranging between 100 and 500. A GEV distribution could then be fitted to the resulting data for the entire ensemble, at each grid cell and for each model, using maximum likelihood estimation. To characterize uncertainty in the change of GEV parameters in each climate model's average change, the initial index data were first resampled using a bootstrapping technique, F I G U R E 2 Change in scale and shape parameters σ and ξ for the annual maximum daily averaged temperature index Tx1day at 3 C of global warming for each model relative to period 2005-2014. Hatching indicates areas in which there is non-robust change, where the mean change is less than the SD of bootstrapped values and therefore less than the internal model variability. Scale parameter shows a clear increasing trend over the tropics whilst decreasing over higher latitudes. There is a clear boundary at the edge of the Arctic sea ice which is also seen in the shape parameter Fitted GEV distributions of bootstrapped annual maximum daily averaged temperature index Tx1day at four different grid points selected to highlight regions of consistent and very uncertain change. As highlighted in boxes (a-d), these are: Svalbard near the Northern sea ice edge; over the Western Amazon region; South Western Australia, and Central Europe. Boxes (e-h) show the GEV fit for each model in the reference period, overlaid on ERA5 reanalysis data for the same period. Overall the IPSL model appears to produce the closest reproduction of near present day conditions, but still with large differences when compared to this reanalysis product. The divergence of CanESM5 over the Amazon is particularly pronounced. Boxes (i-x) show the PDFs for each model at all levels of warming and the reference period. While some areas show very strongly model dependent distribution changes (such as the over Amazon), South West Australia and Central Europe exhibit a somewhat more consistent slight widening of the distribution, although there are still exceptions for some models and thresholds. The thick lines in the PDFs represent the ensemble mean, whilst the low weight lines each represent the fit of an individual bootstrapped ensemble member. Shown below the MPI-HR GEV fits are histograms of the bootstrapped Tx1day data used to fit the distributions for that model resampling with replacement across all ensemble members for each model. One ensemble member of 10 values was removed for each sample, and 100 resamples were made. For each set of bootstrapped parameters, the mean and SD of the subsequent calculated GEV distributions were calculated. To discriminate between the warming signal and noise due to internal model variability, the bootstrapped average of each parameter was compared to 1SD of the resampled parameter fits. If the mean change was greater than the SD, and therefore greater 68% of the internal variability of the system, it was considered a possible true signal. A risk ratio (RR) is a commonly used method of quantifying the change of probability of occurrence of extreme climate events (Kirchmeier-Young et al., 2019). The RR is defined as the ratio of the probability of an event exceeding a threshold in a future climate scenario to its probability in the current climate. The probabilities are calculated as the integral of the probability density function (PDF), above a certain percentile, of the fitted GEV distribution. A RR > 1 indicates increasing probability in the future climate, and RR < 1 indicates a decreasing future probability, while a RR of 1 indicates no change in likelihood between the two climates. In this study, changes in 10 year return value events were examined: extreme temperature events which in the reference period have a 1-in-10 chance of occurring each year. The cumulative distribution function in the reference period was used to find the 90th percentile of the probabilities and the PDF integrated from this point to infinity, giving the probability of the rarest 10% of extreme events occurring (10% in the reference period). The process was repeated in the future scenario, integrating the new PDF from the same value to find the future probability of the same events. The ratio of the two integrals was then taken for each level of warming, and the resulting RRs calculated at each grid point.

| RESULTS
Results indicate changes in the probability of extreme temperatures with warming. This was diagnosed by determining changes in the parameters of the GEV for all grid points, and with analysis of specific grid points of interest. As illustrated for a generic GEV in Figure S1, an increase in the location parameter of the GEV leads to an overall increase in magnitude of events with an increase in frequency of rarer events in the tail. Changes in the shape parameter largely affect the tail of the distribution and the size of the peak. Increasing the shape parameter can be thought of as stretching the tail out and indicates an increase in the size and frequency of the largest extremes.
Alternatively, decreasing the shape parameter will shift the mean of the distribution towards the tail, causing more higher average maxima, but little change in rare extremes. Increasing the scale parameter increases the spread of the distribution, whilst reducing it increases the size of the peak.

| Changes in the characteristics of extreme temperature events
Change in location parameter dominates the shifts in GEV distribution at all three levels of warming. Figure 1 shows the change in μ in each model at the respective levels of warming. Significant warming of hot extremes occurs over land masses, notably central North and South America, Central Europe/Asia, and Northern Africa. The Arctic is found to warm rapidly, exhibiting some of the strongest warming, particularly in the IPSL model. Although models are generally in agreement, the MPI-LR and MPI-HR models indicate several areas of cooling at 1.5 and 2 C, particularly in the subpolar North Atlantic, which is consistent with previous studies (Sgubin et al., 2017), and in Central Africa and North East India, where the cooling may be related to changes in precipitation and the Monsoon, respectively, due to the dominant role the hydrological cycle plays in these regions. Can-ESM5 and ISPL also exhibit non-robust signals in the Southern Ocean and over Antarctica at 1.5 and 2 C. At 3 C, the models are in agreement to a positive change in location parameter consistent with warming, at all grid points except for a small number of grid points in Southern Ocean in the MPI-LR model. Results also show strong increases in annual maxima in the North Pacific Ocean and slower warming in the Southern Ocean. Warming over central Antarctica is greater than coastal warming, but results around Antarctica may be very dependent on ice-sheet and sea-ice models, which were not evaluated in this study. Heavily populated areas which see the greatest levels of warming in extreme temperatures include Southern and South Eastern Europe, the East coast of the United States, and South Eastern Brazil. Changes in scale parameter, indicating the variability of extremes, show clear trends across models when examined at the highest level of warming, with the tropics exhibiting a general increase in σ across all models, whilst higher latitudes show decreasing values of σ. We examined the changes at 3 C of warming to investigate the changes from the largest amount of warming ( Figure 2). This trend is particularly clear over Antarctica, except over the Ross and Filchner-Ronne ice shelves where σ increases. This increase over non-grounded ice is particularly strong over the Arctic sea ice, which is bounded by a region of strongly decreasing scale parameter at the ice edge. This change is influenced by the changing fraction of sea ice over high latitudes in Arctic summer, which when present decouples the sea surface from the atmosphere and influences the distribution of extreme temperatures. This trend of strong changes at the Arctic ice edge is also seen in the shape parameter, with all models agreeing on decreasing shape parameter indicating a reduction in the most extreme events over F I G U R E 4 Risk ratios for the annual maximum daily averaged temperature index Tx1day, at 1.5, 2, and 3 C of warming, for events with a return value of 10 years in the near-present reference period. Small regions exhibit reduced RRs due to cooling at lower levels of warming, but the general trend is of increasing RRs with warming. A large number of grid points show RRs > 10 at 3 C warming, indicating events which previously happened once a decade will occur annually. These regions are typically over the oceans where variability is less and as such a small amount of warming significantly increases the probability of an extreme occurring the Arctic. Elsewhere, changes in ξ show much less consistent trends between models, with large areas of nonrobust change in all models, and the fraction of significant changes across the globe close to the fraction of points with significant changes expected by chance (which is $32%).
The variation in responses of individual models was examined at several grid points. Of particular interest are locations which indicate strong changes in GEV parameters but varying levels of agreement between model data and subsequent changes. Figure 3 shows GEV distributions fitted from model output at four different grid points: Northern Svalbard near the edge of the Arctic sea ice, where the distribution narrows in the two models with the strongest warming particularly; the Western Amazon rainforest where two models show strongly widening distributions, but two other models show less clear results although with some widening of the distribution (note that MPI-HR shows increases slightly East of the chosen grid point); and South Western Australia and Central Europe, which exhibit slightly more consistent behaviour leading towards a modest widening of the distribution or neutral responses. The comparison of the fitted GEV distribution to histograms of annual maxima, and the relatively small spread in GEV fit indicates that choosing large ensembles overcomes substantial uncertainty due to internal climate variability. Results at these points instead indicate the wide variety of warming responses between models. The large uncertainty in the warming response, particularly over the Amazon, may be influenced by soil moisture response, vegetation response and possibly changing climate circulation. In summary, changes in the shape of future distributions vary strongly between models and indicate a major uncertainty in future extreme temperature distributions between models.
A comparison of the simulated distribution of annual maxima at each location in the reference period with histograms of the same index calculated from the ERA5 reanalysis (Hersbach et al., 2020) is also shown in Figure 3. It also illustrates that there is a substantial divergence from realistic conditions in some models, with models that show slightly more realistic present-day conditions at the chosen grid points tending to show slightly less pronounced changes in the distribution in the future.
3.2 | Change in RRs at 1.5, 2, and 3 C of global warming Changes in risk for extreme temperatures over land (i.e., risk ratios) are found to increase globally as mean surface temperature increases, see Table 1 and Figure 4, but with some regional reductions in RR linked to localized cooling, particularly over the Southern Ocean. The increase in longer duration events is greater than the increase of shorter duration events at all levels of global warming. Significant regions see RRs > 10 at higher levels of warming, indicating the now 1-in-10 year events in the reference period are projected to occur annually in these scenarios. These regions generally occur over the oceans where variability is less and therefore warming significantly increases the probability of an extreme occurring. Globally averaged RRs over land indicate considerably larger values when excluding Antarctica (as to focus only on human occupied land), shown by comparing Tables 1 and S3. Global land averaged RRs range from 3.18 at 1.5 C, to 8.02 at 3 C for Tx1day, increasing with index length to 3.29 and 8.11 for 1.5 C and 3 C, respectively, for the Tx5day index. This indicates a greater increase in longer duration annual heat extremes than shorter duration events, and an increasing risk of extreme temperatures with increasing global warming level.

| DISCUSSION AND CONCLUSIONS
We have used large ensemble models from the CMIP6 model inter-comparison project to investigate the change in extreme temperature events at 1.5, 2, and 3 C of global warming. Using the SSP370 emissions scenario to obtain the largest number of ensemble members, the future model simulations were compared to the last decade of historical runs, which were in turn related to the estimated difference between that decade and the period 1850-1900. The GEV distribution was then fitted to the annual 1, 3, and 5 day averaged near surface air temperature extremes for the decade in which the models reached the respective the levels of warming and resampled, using a bootstrap method to characterize uncertainty.
The four models used in this study differ in their grid resolution and the extent to which earth system feedbacks are incorporated, yet no clear systematic differences emerge with resolution or complexity. Higher resolution models may capture processes relating to heatwaves better, although hot extremes tend to be reasonably large scale in general (Krueger et al., 2015). Furthermore, the models differ in their ability to reproduce present day conditions as captured by a reanalysis product. This comparison highlights the need for large sample sizes in in models, due to the challenge posed by achieving a robust GEV fit to a small number of samples per decade. We thus chose models with relatively large ensembles, avoiding large uncertainty associated with internal climate variability of smaller ensembles (Maher et al., 2019). Instead, as evidenced over the Amazon grid point, inter-model differences in physics and parameterisations of processes appear substantial between the models. To improve future estimates of extreme event characteristics, an approach similar to Brunner et al. (2020) could be implemented to constrain model projections based on their historical performance and their inter-dependency, although this would require more large ensembles of individual models. The location parameter of the fitted distributions is found to dominate the global trend in distribution shifts, with significant increases over land masses and in the Arctic, consistent with the Arctic Amplification. The scale parameter exhibits a tendency for increases over the tropics and reduction at higher latitudes, while changes in shape parameter exhibit less consistent trends that are spatially more noisy, many of them insignificant, and with the models only agreeing on robustness and sign of change in small areas. The range of responses at individual grid points highlights the model-sensitivity of these changes. The exception is the Arctic, with all models agreeing on high magnitude changes over the sea ice and around the ice edge, a trend which was also found in the scale parameter. RRs calculated using the fitted distributions indicate strong increases, and show that we should expect to see a significantly increased frequency of annual extreme temperature events at projected future warming levels.