Increasing Intensity of Extreme Heatwaves: The Crucial Role of Metrics

In weather and climate applications, a wide range of commonly employed heatwave intensity indices relies either on cumulative or averaged values of temperature‐based variables. In this study, by comparing four different heatwave intensity indices applied to reanalysis data we show that metrics based on cumulative or averaged values lead to important differences in the detection of the most intense events of the period 1950–2021. This suggests that particular attention is needed when using the two families of metrics for assessing heatwave intensity. Indices based on cumulative values should be preferred over the ones relying on temporal averages, better allowing for the comparison of events of different length. Under these considerations, one of the considered cumulative indices is used for characterizing heatwaves of the period 1950–2021, showing that heatwaves that were unlikely before 1986 have become almost 10 times more frequent and up to three times more intense during recent times.

• The most intense heatwaves of 1950-2021 considerably change if considering intensity indices either based on cumulative or averaged values • An appropriate measure of heatwave intensity should be based on cumulative indices allowing to better compare events of different length • The most intense heatwaves of  have become up to 10 times more frequent and up to three times more intense during 1986-2021 daily temperatures (Russo & Sterl, 2011;Russo et al., 2014Russo et al., , 2015Russo et al., , 2016, there are many cases where the terms intensity or magnitude of a heatwave are instead used for referring to averaged values of temperature-based variables (Bitencourt et al., 2016;Cowan et al., 2014;Cueto et al., 2010;Ganguly et al., 2009;Holbrook et al., 2022;Lhotka & Kyselỳ, 2015;Perkins et al., 2012;Schaeffer & Roughan, 2017;Ward et al., 2016;Yu et al., 2020;Zhang et al., 2020). Despite their extensive use for characterizing the same heatwave feature, the two definitions are not equivalent and they might lead to different conclusions. There is indeed the need to better assess the effects of the use of metrics for heatwave intensity based on these two different definitions, highlighting possible drawbacks and contributing to a more unified and consistent application of indices in climate studies. This will further allow for a better comparability of events over time and in space. Driven by this goal, in this study the most intense heatwaves occurring over the period 1950-2021 are detected and characterized at a global scale by means of four different heatwave intensity indices either based on cumulative or averaged values of temperature-based variables.
All of the presented analyses are conducted on the ERA5-Land reanalysis data set (Muñoz-Sabater et al., 2021). Prior to the indices calculation, ERA5-Land is evaluated against the Berkeley-Earth and JRA-55 data sets to identify areas where ERA5-Land can be considered more reliable in terms of interannual variability of daily maximum temperatures. Then, the most intense heatwaves and the year in which these events occur over the period from 1950 to 2021, according to ERA5-Land, are determined using the four considered indices, with the goal of highlighting different conclusions arising from different methods. Finally, in a last step, differences between the first and the second half of the considered study period in terms of occurrence and magnitude of very extreme heatwaves are investigated for one of the proposed metrics and specific regions.

Data
The analyses presented in this study are based on daily maximum temperature for the period 1950-2021 from the ERA5-Land reanalysis data set, interpolated through conservative remapping on a regular grid at a spatial resolution of 0.25°longitude × 0.25°latitude.
ERA5-Land provides hourly information of surface variables at a spatial resolution of ∼9 km. The data is derived from a single simulation with the European Centre for Medium-Range Weather Forecasts Carbon Hydrology-Tiles scheme for Surface Exchanges over Land model, forced by meteorological fields of the lowest atmospheric level of the ERA5 reanalysis (Hersbach et al., 2020), with an additional lapse-rate correction (Muñoz-Sabater et al., 2021). The model version employed for the production of ERA5-Land is very similar to the one used for ERA5, but with an updated parameterization of the soil thermal conductivity after Peters-Lidard et al. (1998), technical fixes improving the conservation of soil moisture balance and additional improvements for the calculation of potential evapotranspiration fluxes. These improvements do not lead to remarkable differences between ERA5-Land and ERA5, given the fact that they still share common and similar parameterizations of land processes (Muñoz-Sabater et al., 2021). The main added value of ERA5-Land over ERA5 is attributable, according to Muñoz-Sabater et al. (2021), to the non-linear dynamical downscaling with corrected thermodynamic input, allowing for example, to better discriminate between land and sea points over coastal areas.
The reliability of ERA5-Land in terms of the interannual variability of the seasonal maxima of daily maximum temperatures is assessed here against two additional data sets: the gridded Berkeley-Earth observational data set (Rohde et al., 2013) (BE hereafter) and the Japanese reanalysis JRA-55 (Kobayashi et al., 2015). These data sets are chosen as they have a temporal coverage similar to ERA5-Land, with BE starting in 1950 and JRA-55 in 1958. The BE data set provides daily values of maximum temperature on a regular grid with a spatial resolution of 1°longitude × 1°latitude. These values are originally derived from observations with sub-hourly to hourly temporal resolution (Rohde et al., 2013). On the other hand, JRA-55 daily maximum temperatures are calculated here from the original 6-hourly gridded data presenting a spatial resolution of 1.25°longitude × 1.25°latitude. For the comparison against these two other data sets, ERA5-Land is first upscaled onto the respective coarser-resolution grids from each data set, through conservative remapping. Additionally, daily maxima of ERA5-Land derived from 1-hourly and 6-hourly data are respectively used for the comparison against BE and JRA-55.
All the employed data sets cover the entire globe. However, in this study only the data between −80° and 80°N are considered.

Heatwave Definition
Heatwaves are generally defined as consecutive periods with extreme temperatures exceeding a given threshold (Data, 2009;Domeisen et al., 2022). Similarly to Russo et al. (2015) and Perkins-Kirkpatrick and Lewis (2020), here we consider heatwaves as events of at least 3 consecutive days with temperatures exceeding the threshold Tr90 d , calculated as the 90th percentile of daily maximum temperatures in a sliding window of 30 days around the considered day d, over a 30-year reference period. We select the period from 1961 to 1990 as our reference period, as this period is often considered as a reference for long-term climate change assessments (Tavakol et al., 2020) and as the employed data sets start on 1958 at the latest.
We separately analyze heatwaves in different seasons. In order to include heatwaves that start in a season and finish in another, periods of 5 months are selected around each 3-month season for the definition of heatwaves, with an additional month at the beginning and at the end of their classical definition (e.g., May-September for boreal summer). Then, to avoid counting single events twice, heatwaves are assigned to a specific season depending on the largest number of days they have in the three central months of each season.

Heatwave Intensity Indices
With the goal of identifying possible differences arising from different definitions of heatwave intensity extensively applied in weather and climate applications, four indices are considered in this study, either using accumulated or averaged values of temperature-based variables. The first of these indices is based on the magnitude assessment of single heatwave events, while the other three jointly consider all the days characterized by a heatwave during an entire season. Below, the four considered indices are described in detail.

HWMId
The HeatWave Magnitude Index daily (HWMId) of Russo et al. (2015) is calculated here as the sum of the daily magnitude index M d over each of the days n composing a single heatwave event hw: where T d is the daily maximum temperature on day d of the heatwave hw. For computing M d , first anomalies of daily maximum temperature are computed for the day d with respect to the 25th percentile of seasonal temperature maxima over the reference period. Then, the anomalies are standardized by the interquartile range of the same seasonal temperature maxima over the reference period, allowing for a comparison of different points in space characterized by a different interannual variability of seasonal temperature maxima: T 30y25p and T 30y75p are, respectively, the 25th and 75th percentile values of the time-series composed of seasonal temperature maxima for the reference period 1961-1990 and considered season.
The methodology introduced by Russo et al. (2015) was designed for characterizing heatwaves over Europe, where the yearly temperature maxima generally occur in boreal summer. Here, considering almost the entire globe, it is important to acknowledge that over other areas yearly temperature maxima might take place at different times of the year. For this reason, similarly to Russo et al. (2016), the presented analyses are conducted separately for each season of the year (e.g., June July August (JJA)) and seasonal maxima of daily maximum temperatures are used in each case instead of the yearly maxima. Importantly, in this case a heatwave occurring between two distinct seasons would yield different HWMId values depending on the season it is attributed to. Hence, it becomes critically important to assign events unambiguously to individual seasons in order to ensure consistent analysis and interpretation.

HEATcum
The cumulative heat (HEATcum) defined in Perkins-Kirkpatrick and Lewis (2020), is given by the sum of the anomalies with respect to the threshold Tr90 d in daily maximum temperatures (see Section 2.2), over all days characterized by a heatwave in a given season, for each of the years of the study period: where d is the considered heatwave day and n the total number of heatwave days in the given season.

AVI
The heatwave Average Intensity (AVI) of Perkins-Kirkpatrick and Lewis (2020) is the average daily maximum temperature calculated over all the heatwave days of a season, for each year of the considered period: where, similarly to Equation 3, d is the day of a heatwave in the considered season, and n the total number of heatwave days in that season.

AVA
The heatwave Average Anomalies (here referred to as AVA) index is derived from the study of Perkins-Kirkpatrick and Lewis (2020) and represents the average of the daily maximum temperature anomalies with respect to the corresponding threshold Tr90 d , calculated for each year of the study period, over all the heatwave days of a given season: where, again, d is the given heatwave day and the total number of heatwave days in the considered season of a given year.

Changes in Very Extreme Events
In a last part of the study we conduct a comparison of the number and intensity of very extreme heatwaves between two periods of 36 years (hereafter referred to as Early Period (EP) and Late Period (LP), respectively), the first starting in 1950 and the second in 1986. Since for DJF it is only possible to compute 71 seasonal statistics over the period of ERA5-Land availability , in this case two periods of 35 years each are considered, starting in 1952 and 1987, respectively.
The comparison between the two periods is conducted separately for each season, for selected subregions of the Northern Hemisphere, namely Central North America (CNA), Europe (EUR) and Northern Asia (NAS, see Figure S1 in Supporting Information S1). These regions are the ones for which the ERA5-Land shows a better agreement with the other data sets in terms of the interannual variability of the seasonal maxima of daily maximum temperatures (Figure 1).
For each season, a heatwave is defined as "very extreme" when its corresponding HWMId value is larger than the 99.9th percentile of the values calculated for all the points of a given subdomain over the period 1950-1985.

Results
Prior to the comparison of the considered indices applied to daily temperature maxima derived from ERA5-Land, an evaluation of ERA5-Land against BE and JRA-55 is conducted in terms of the interannual variability of seasonal maxima of the target variable for each grid-cell of the domain. This allows us to better understand where the data are reliable for the estimation of the most intense heatwaves of a given period. Figure 1 shows global maps of the Spearman rank correlation calculated over each grid point of the domain at a 1°longitude × 1°latitude spatial resolution, between the time-series of seasonal maxima of daily maximum temperatures derived from ERA5-Land and BE. In all seasons, the two data sets show a good agreement in terms of the considered variable between 30°N and 65°N, while the agreement between the data sets in the continental Southern Hemisphere is lower. For the tropical regions of South America and Africa, even negative correlations are evident in some cases between the two data sets. Regions characterized by complex topography, such as Greenland, Antarctica and the Tibetan plateau, also show very low correlation between ERA5-Land and BE, with values generally lower than +0.2 for all seasons. Northern North America exhibits low correlation (below +0.5) in DJF. Also, Western Australia shows a lower correlation in DJF and SON than in the other seasons. The same analyses conducted between ERA-Land and JRA-55 produce a slightly better agreement between the two data sets than between ERA5-Land and BE, even though large parts of the southern hemisphere still exhibit correlations below 0.5 in all seasons ( Figure S2 in Supporting Information S1). This agrees well with the findings of Thompson et al. (2022), revealing a significant discrepancy between the two data sets over a large part of the continental Southern Hemisphere in terms of the most intense heatwaves of the period 1958-2021. A similar spatial agreement between the different data sets is also obtained when calculating the Spearman Rank correlation onto the JJA maxima of HWMId ( Figure S3 in Supporting Information S1).
In a next step, daily maximum temperatures from the ERA5-Land data set at a spatial resolution of 0.25°longitude × 0.25°latitude are used to determine the most intense heatwaves of the period 1950-2021 and the year in which they occur, according to the four indices defined in Section 2.3. The goal is to investigate whether and how the detection of the most extreme events changes when considering definitions of heatwave intensity either based on cumulative or averaged values. Figure 2 shows the grid-cell maxima of the four given indices for JJA, over the period 1950-2021. Figure 2 illustrates how the AVI has a completely different pattern of the maxima with respect to the other indices. This is due to the fact that the AVI considers absolute temperatures: for this index it is not possible to properly compare extremes over regions characterized by different climatologies. Considering the indices based on temperature anomalies, the AVA, relying on temporal averages, has a different pattern of the maxima with respect to the two other indices (i.e., HWMId and HEATcum). In particular, while HWMId and HEATcum both have some of their highest JJA values over Western Russia, which can be associated with the extreme summer heatwave of 2010 (Russo et al., 2014(Russo et al., , 2015, values of AVA over this region are not very pronounced. This behavior is noticeable also when considering the extreme values of HWMId and HEATcum for Central Africa and South America (where ERA5-Land exhibits a strong disagreement with respect to the other data sets): these anomalous events almost disappear in the case of AVA, in all seasons ( Figure S4 in Supporting Information S1).
Another interesting way to look at possible differences arising from the application of the different metrics is by considering, for each grid-cell, the years when the corresponding event with maximum intensity occurs over the period 1950-2021. Figure 3 shows that the maps of the year when the JJA maxima in the given metrics occur are pretty similar in the case of HWMId and HEATcum, presenting a spatial pattern correlation of ∼0.6. Conversely, the AVI and AVA show a very different spatial distribution compared to the two other metrics, with spatial correlations calculated against HEATcum of ∼0.2 in both cases. Importantly, the differences between AVA and HEATcum are considerably larger than the differences between HWMId and HEATcum, even though HWMId is based on a different target variable (i.e., standardized anomalies of daily temperature maxima) and considers Figure 2. June July August maximum values of (clockwise starting from the top-left corner) HeatWave Magnitude Index daily, cumulative heat, Average Intensity, and AVA, applied to ERA5-Land daily maximum temperatures over the period 1950-2021. The grid-cells for which the correlation of seasonal maxima of daily maximum temperatures between ERA5-Land and BE is lower than +0.5 are shaded in gray. single heatwaves of a season. This is true also when considering other seasons ( Figure S5 in Supporting Information S1), when calculating the HWMId over all the heatwave days of a season, as for HEATcum ( Figure S6 in Supporting Information S1), as well as when conducting the same analysis on different data sets (Figures S7 and S8 in Supporting Information S1). Hence, the conclusions that can be drawn from the two groups of indices on the most intense events and the years in which they occur are substantially different.
Finally, the trends in the intensity of single heatwave events are investigated over the entire period 1950-2021. The HWMId allows us to calculate heatwave intensity for single events, while at the same time providing a standardized measure of extreme heat useful for the comparison across time and space. Hence, for the analysis of changes in very extreme events as defined in Section 2.4, the HWMId of single heatwaves over the period 1950-2021 is calculated for the three selected subregions CNA, EUR, and NAS (see Figure S1 in Supporting Information S1). Figure 4 shows for each season and considered region the number of very extreme heatwaves for different values of HWMId. In general, a higher number and more intense extreme heatwaves are evident over the period 1986-2021 compared to 1950-1985, in all seasons and regions. In CNA, the maximum intensities are higher during LP than in EP, for all seasons, up to almost two times in DJF. The number of very extreme events also strongly increases in all seasons in this region during LP compared to EP, up to almost four times more in MAM. In CNA, only a small number of LP events present a value of HWMId higher than the maximum recorded in EP (vertical dotted line, Figure 4), up to 118 events more during JJA. In NAS, the maximum intensities are also more  Figure 2, but for the year when the June July August maximum values of (clockwise starting from the top-left corner) HeatWave Magnitude Index daily, cumulative heat, Average Intensity, and AVA occur over the period 1950-2021, according to ERA5-Land daily maximum temperatures. The grid-cells for which the correlation of seasonal maxima of daily maximum temperatures between ERA5-Land and BE is lower than +0.5 are shaded in gray.

of 11
pronounced during the most recent of the two periods, for all seasons, with values up to more than three times higher in JJA. For NAS the number of very extreme events increases by more than a factor of two over LP, in all seasons, with an exceptional increase by a factor greater than five in JJA. For the same region, during LP, a large amount of events presents HWMId values higher than the highest EP value, for a maximum of 259 times in JJA and 300 in MAM. The largest changes between the two periods in both the number of events and their maximum intensities are evident for Europe, in particular in JJA, DJF, and MAM. Here an event considered very extreme during EP occurs at least four times more often during the most recent period in JJA and MAM, and more than nine times in DJF. The maximum HWMId value registered over the two periods for EUR is almost the same in SON and MAM, approximately twice in DJF and up to three times more in JJA during recent times. Additionally, for EUR, in an exceptionally high number of cases the maximum HWMId value of the period 1950-1985 is exceeded during the most recent period, up to 615 times in DJF and almost 2000 times in JJA.

Discussion and Conclusions
Several studies have called for a more unified definition and assessment of heatwave characteristics (Perkins & Alexander, 2013;Perkins-Kirkpatrick & Lewis, 2020). Nonetheless, in the climate and weather community, a plethora of different approaches is still employed for the definition of heatwave intensity, with metrics based on cumulative or averaged values of a target variable often used for referring to the same feature.
The results presented in this study show that the selection of metrics for the assessment of heatwave intensity needs caution: the year and spatial distribution of the most intense events over the period 1950-2021, as calculated from ERA5-Land daily maximum temperatures, change remarkably when considering two different families of metrics either relying on temporal averages or on cumulative values.
The use of heatwave intensity indices based on cumulative values should be preferred over the ones relying on temporal averages since, as already suggested by Russo et al. (2014), assessing intensity through averaged values does not allow for an unequivocal comparison of the magnitude of events with differing length. One simple example that could help in clarifying this point further is by considering two different heatwaves, the first one, HW1, lasting 3 days and the second, HW2, lasting 4 days. Let us suppose that HW1 has a value of the anomalies of +3°C for each of the three heatwave days, and HW2 has anomalies of +3°C for three heatwave days and of 2°C for the fourth one. When considering the average value of the anomalies, the event HW1 (mean value = +3°C) will misleadingly be considered more intense than HW2 (mean value = +2.75°C).
While measures of average heatwave intensity can be used for characterizing heatwaves, the comparison of events based on such metrics must be conducted with extreme care and is possible only when considering averaged intensity jointly with the length of an event. In climate and weather applications examining heatwaves in terms of departures from a reference value, a more unequivocal approach that could avoid any possible misleading assessment of heatwaves, allowing for a better comparison of events of different length, is to measure the intensity by cumulative values of the anomalies with respect to a given threshold. This choice is further justified by the fact that the impact of heatwaves is primarily determined by the accumulation of excess heat experienced over a specific time period, rather than relying on averaged values (Guo et al., 2017;Perkins-Kirkpatrick & Lewis, 2020). This measure can then be accompanied by the measurement of additional features such as the maximum amplitude of an event and its duration, following the framework proposed for hydro-climate extremes by Biondi et al. (2005Biondi et al. ( , 2008 and Zuniga et al. (2021).
Based on these considerations, a grid-point-based analysis of the trends of heatwaves over the distinct periods 1950-1985 and 1986-2021 is successively performed using the cumulative index HWMId of Russo et al. (2015). In this case, only the regions where ERA5-Land presents a higher correlation against JRA-55 and BE in terms of the interannual variability of seasonal maxima of daily maximum temperatures are considered, namely Europe, CNA, and NAS. Overall, there is a clear increase in the number and intensity of heatwaves over the recent years 1986-2021, compared to what was considered very extreme during 1950-1985. Europe is the area where the most pronounced changes between the two periods emerge, with the total number of very extreme events increasing almost ten-fold in boreal winter, and the maximum intensity reaching values up to three times higher in boreal summer during the most recent times, compared to the former period: what was almost impossible during the period 1950-1985 has become more common and extreme during the years from 1986 to 2021.
This study sets the basis for a more unified and improved use of metrics for the calculation and comparison of heatwave intensity, at the same time providing an analysis of the trends in the number and intensity of very extreme events over the period from 1950 to 2021, for selected regions, revealing exceptionally severe changes in heatwaves.

Data Availability Statement
The ERA5-Land hourly near surface temperature data used for the computation of the different heatwave indices proposed in the study are available at the ECMWF Copernicus Climate Change Service (C3S) Climate Data Store (CDS) via https://doi.org/10.24381/cds.e2161bac. The Berkeley-Earth gridded daily maximum temperature observational data are available at http://berkeleyearth.org/data/. 6-hourly JRA-55 near-surface temperature data are available at the Research Data Archive of the National Center for Atmospheric Research, Computational and Information Systems Laboratory, via https://doi.org/10.5065/D6HH6H41. The scripts used for the performance of the presented analysis are available at the following link: https://doi.org/10.5281/zenodo.7974769.