Towards a more comprehensive assessment of the intensity of historical European heat waves (1979–2019)

Europe has been affected by record‐breaking heat waves in recent decades. Using station data and a gridded reanalysis as input, four commonly used heat wave indices, the heat wave magnitude index daily (HWMId), excess heat factor (EHF), wet‐bulb globe temperature (WBGT) and universal thermal climate index (UTCI), are computed. The extremeness of historical European heat waves between 1979 and 2019 using the four indices and different metrics is ranked. A normalisation to enable the comparison between the four indices is introduced. Additionally, a method to quantify the influence of the input parameters on heat wave magnitude is introduced. The spatio‐temporal behaviour of heat waves is assessed by spatial–temporal tracking. The areal extent, large‐scale intensity and duration are visualized using bubble plots. As expected, temperature explains the largest variance in all indices, but humidity is nearly as important in WBGT and wind speed plays a substantial role in UTCI. While the 2010 Russian heat wave is by far the most extreme event in duration and intensity in all normalized indices, the 2018 heat wave was comparable in size for EHF, WBGT and UTCI. Interestingly, the well‐known 2003 central European heat wave was only the fifth and tenth strongest in cumulative intensity in WBGT and UTCI, respectively. The June and July 2019 heat waves were very intense, but short‐lived, thus not belonging to the top heat waves in Europe when duration and areal extent are taken into account. Overall, the proposed normalized indices and the multi‐metric assessment of large‐scale heat waves allow for a more robust description of their extremeness and will be helpful to assess heat waves worldwide and in climate projections.


| INTRODUCTION
Heat waves are one of the most dangerous natural hazards worldwide (Campbell et al., 2018), and can lead to tens of thousands of premature deaths, as in central Europe in 2003 and western Russia in 2010 (Gasparrini et al., 2017;Grumm, 2011;Robine et al., 2008). Recent European heat waves occurred in 2018 and 2019, the latter leading to record-breaking temperatures in France, Benelux and western Germany. The range of widespread impacts includes health, crop and infrastructure failure, electricity, business interruption and water shortage (Deryugina & Hsiang, 2014;Forzieri et al., 2017;Werrell et al., 2015). Global surface temperatures have now reached about 1 C above pre-industrial levels, a fact that has contributed to measurable increases in heat wave occurrence (Barriopedro et al., 2011;IPCC, 2021) and is expected to accelerate in the coming decades (IPCC, 2021;IPCC, 2012). Vogel et al. (2019) estimated the persistent 2018 heat wave and drought to recur with 65% and 97% probability in a +1.5 C and +2 C warmer world, respectively. Given that the affected area is estimated to increase by 16% with every degree of warming, heat waves will become an even more dominant factor in European summers.
A common definition of a heat wave is 'a succession of at least three days with hot temperatures' (WMO, 2016). There is a wide range of metrics used to describe the impact of heat waves, depending on the scope of the study (Perkins, 2015). They range from single-to multi-parameter indices and highly differ in calculation complexity. Regarding human health related impacts alone, dozens of heat indices have been developed over the last century (Epstein & Moran, 2006). It is here important to distinguish between 'indices' (to identify) and 'aspects' of heat waves (to quantify) (Perkins & Alexander, 2013). The identified events are evaluated in terms of frequency, duration, intensity and spatial extent. Still, the spatio-temporal evolution can be completely different between events (Hobday et al., 2016). Thus, many different approaches exist to describe the 'extremeness' of heat waves (Shafiei Shiva et al., 2019). Proposals have been made to define heat waves with categories similar to those used for tropical cyclones (Hobday et al., 2018;Loridan et al., 2016). To enable comparability between climate studies, a narrow framework with the most important indices to describe heat waves would be desirable (Xu & Tong, 2017).
Four frequently used indices are chosen in this study to diagnose heat waves in Europe. These are the heat wave magnitude index daily (HWMId, Russo et al., 2015), the excess heat factor (EHF, Nairn & Fawcett, 2015), the wetbulb globe temperature (WBGT, Budd, 2008) and the universal thermal climate index (UTCI, Bła _ Zejczyk et al., 2013). HWMId is often used in studies investigating the dynamics of heat waves (Zschenderlein et al., 2019) or climate change (Russo et al., 2015), whereas the other indices are preferred when impacts to human health are evaluated (Di Napoli et al., 2019;Hatvani-Kovacs et al., 2016;Heo et al., 2019).
The overall aim of this work is utilising the four indices to improve the quantification and comparability of European heat waves. Thus, we analyse European heat wave metrics based on both station data and ERA5 reanalysis (Hersbach et al., 2020). To allow for an intercomparability, a normalisation is applied. Using a multi-linear regression analysis, the variance explained in the time series of the heat wave indices by the different meteorological input parameters, that is, temperature, radiation, humidity and wind speed is explored to understand their contribution to the extremeness of heat waves. We also introduce and visualize a cumulative intensity based on intensity, duration and areal extent of European heat waves that requires a tracking of heat waves. This article is structured as follows. In Section 2, we describe the data, definitions and methodology used. Section 3 presents the results on the local (station based) spatial scale for 2019, while Section 4 presents the results for longer temporal scales and continental spatial scales. The summary and discussion are featured in Section 5.

| DATA AND METHODS
To illustrate the behaviour of the four heat wave indices employed in Section 3, we analysed observations from the synoptic stations Cologne-Bonn (Germany, WMO No. 10513) and Montpellier (France, WMO No. 07643), where all-time records were broken on July 25 and June 28, 2019, respectively. The former is representative for midlatitude temperate climate, and the latter for Mediterranean climate. To calculate the indices, hourly 2 m temperature and dew point temperature, 10 m wind speed and an insolation measure depending on availability (surface global radiation for Montpellier, cloudiness for Cologne-Bonn) were considered for 1979-2019. Before 1981 and 1994, only threehourly observations were available for Cologne-Bonn and Montpellier, respectively. In Section 4, we use ERA5 reanalysis hourly temperature, dew point temperature, wind speed and surface solar irradiation data from 1979 to 2019 (Hersbach et al., 2020) on a 0.28125 regular latitudelongitude grid. This corresponds to about 31 km horizontal resolution. Our study area covers all countries in continental Europe including the European part of Russia, as well as the British Isles.

| Heat wave indices and event definitions
The selected indices are HWMId, EHF, WBGT and UTCI. They are widely used, are considerably different in formulation and are applicable for and across different sectors like health and industrial work and production. HWMId is a climatological index based on daily maximum temperature.
EHF uses daily mean temperature and includes an 'acclimatisation factor' EHI accl , which is important for societal and technical adaptation to high temperatures. WBGT and UTCI include humidity, wind and radiation, important for effects on human health. In addition to current air temperature, WBGT requires natural wet bulb temperature and black globe temperature using the methods of Liljegren et al. (2008) and Matzarakis et al. (2007Matzarakis et al. ( , 2010. UTCI requires 10 m wind speed, vapour pressure and calculated mean radiant temperature for the regression function or Fiala model (Bła _ Zejczyk et al., 2013). The suitable meteorological input data for the calculation (e.g., vapour pressure, cloudiness) are listed in the fifth column of Table 1 and explained in more detail in Supporting Information (SI). While HWMId is dimensionless, WBGT and UTCI are in units of C, whereas EHF is in C 2 . The threshold criteria for when a day is considered as part of a heat wave and the corresponding literature are also indicated in Table 1. The first step to enable comparable event detection is to use relative thresholds rather than absolute ones.
To express WBGT and UTCI in relative terms like HWMId and EHF and to warrant comparability, we define a heat wave day when the daily maximum of WBGT or UTCI is above the 90th percentile for the summer half year (April-September). The daily maximum of the latter two indices is taken from hourly data, except before 1981 for Cologne-Bonn and before 1993 for Montpellier. For those periods only 3-hourly data was available (cf. SI). All indices are evaluated for the extended summer season (April-September) in ERA5 data. The heat wave definition has to be fulfilled for three consecutive days, ending on the first heat wave day. For each day of the heat wave, index values (referred to as 'daily intensity') are calculated according to the formulae in Table 1. The number of identified days forming an uninterrupted series of values larger than zero defines the event duration. An exception is HWMId, since it distinguishes between heat wave occurrence and magnitude, i.e. heat wave days can have zero magnitude and in that case they do not contribute to the event magnitude. Thus, heat wave duration is determined by days with HWMId ≥ 0 (cf. Russo et al., 2015). The heat wave intensity at a station or grid point is determined by summing all index values over the duration of the event, corresponding to a 'cumulative intensity' or 'event sum' (Hobday et al., 2016). Regarding intercomparability, we divide all index values by the 85th percentile before aggregating them to the cumulative intensity (see Figure S1). The 85th percentiles are determined from the annual maximum values per grid point for 1979-2019. As some years may have zero values, the assessment of the 85th percentile was carried out following Schlueter et al. (2019) to account for up to 30% zero T A B L E 1 Description of applied heat wave indices with formulae, units, threshold criteria, input data and references

Index Formula
Unit Threshold criteria Input data References ND uninterrupted series of days with daily T max > daily 90th percentile of T max with 31-day centred window 2 m temperature Russo et al. (2015) EHF EHI sig Â max(1,EHI accl ) C 2 Uninterrupted series where three-day mean temperature > yearly 95th percentile of T mean and higher than previous 30-day period 2 m temperature Nairn and Fawcett (2015) WBGT 0.7 T nwb + 0.2 T g + 0. percentiles of the annual daily temperature maxima in the reference period. The reference period 1979-2019 is used for all indices. EHI sig : excess heat index, significance factor, EHI accl : excess heat index, acclimatisation factor. T nwb : natural wet bulb temperature, T g : black globe temperature, T a : current air temperature. v 10m : ambient 10-metre wind speed, e: vapour pressure, T mr : mean radiant temperature. 'ND' in the third column indicates a dimensionless index. The introduced threshold criteria for WBGT and UTCI as well as the standard meteorological parameters used in the computations are in italics. Additional descriptions of the four indices and the input parameters are included in Supporting Information (SI).
values in the 41-year investigation period. The choice of the 85th percentile is in line with Nairn et al. (2018).
Multi-linear regressions are performed considering all days with index values above zero. This corresponds to days with either local (for the station-based analysis) or grid point based heat waves. These models are used to assess the percent variance explained by the input parameters and to estimate the change (in physical units) required for a given parameter to increase the severity of the heat wave by 1 (see Section 3 and SI). The linear model follows the equation: y = a + b p Â x p + e, with y: absolute index value > 0, x p : absolute value of input parameters (temperature, humidity, radiation, wind), a: intercept, b p : regression weights for input parameters, e: other unaccounted, non-linear factors. The inverse of each regression coefficient b p considered separately indicates by how much a parameter has to increase, in order to increase the index value y by 1, assuming it is the only responsible factor for the increase. Note that for EHF the only meaningful direct input parameters for the equation are three-day and 30-day mean temperatures whereas for HWMId only daily maximum temperature is used as input. The coefficient of determination R 2 gives information about how well the model describes the relationship between the index and the parameters.

| Representation of spatio-temporal characteristics of heat waves
We consider the contiguous area affected by heat waves for successive days to quantify its total area, duration and intensity. Heat wave areas were defined by contiguous grid points with index values > 0 (≥0 for HWMId). The area represented by these grid points has to cover at least 500,000 km 2 over land pixels. The identified areas of successive days are aggregated into one heat wave event, if their centroids (intensity weighted over area) lie within 1000 km distance. Thus, the large-scale duration is the number of consecutive days when these additional two criteria are met, in line with S anchez-Benítez et al.
. The large-scale intensity of a heat wave event is defined as the sum of all daily intensities and for all involved grid points. The spatial extent is defined as arithmetic mean of the affected daily land areas over the duration of the heat waves. The above-defined characteristics intensity, duration and spatial extent are visualized as 'bubble plots' (Ouzeau et al., 2016) with duration on the x-axis, intensity metric on the y-axis and the size of the bubbles representing the areal extent (see Section 4). Additionally, we calculated percentiles for all parameters (temperature, radiation, humidity, wind) for all grid points in heat wave areas. The percentiles are calculated as follows: 1. For each grid point, select the hour when the daily maximum index value occurred. 2. Calculate the percentile with respect to the distribution obtained from all ±5-day centred intervals around the associated date and time obtained in 1., for all years of the reference period. This is done to take the diurnal and seasonal cycles into account. 3. The percentiles are accumulated following the same procedure as for large-scale intensity and the mean of all contributing grid points and days was taken, as shown in Section 4.

| LOCAL EVALUATION OF HEAT WAVES
The behaviour of the four indices identifying and quantifying local heat waves (cf. Table 1) Table 1) are marked in grey above the x-axis in Figure 1c-j. The number and 'extremeness' of the heat waves strongly depend on the choice of index (Figure 1). For Cologne-Bonn (Figure 1c,e,g,i), three distinct events in June, July, August are identified. However, the June event is split into two parts for EHF and UTCI (Figure 1e,i), and additional differences in duration and intensity are noted. Adapting Schlegel and Smit (2018), we define five heat wave categories: slight (>80th), moderate (>85th), strong (>90th), severe (>95th) and extreme (>98th). The percentiles are calculated from all normalized index values larger than zero for 1979-2019. The June event reaches the 'moderate' category for HWMId, 'extreme' for EHF and 'severe' for WBGT and UTCI. The July event is 'extreme' for all indices, with the highest daily value of the whole record for M d (cf . Table 1) and UTCI, ranking third and fourth for EHF and WBGT, respectively (not shown). EHF displays by far the largest value for the first peak (Figure 1e), but is only third, after UTCI and WBGT, for the end of August heat wave. This reflects the enhancement of heat wave intensity given an abrupt shift from cold to hot weather. For the extreme heat wave at the end of July, WBGT is lower than all other indices, most likely because this heat wave was comparatively dry. The August event with lower peak temperature than in July but higher humidity is not exceptional for HWMId and EHF, but is 'strong' for UTCI and 'severe' for WBGT due to a high influence of humidity.
The temperature and percentile values are higher in Montpellier (Figure 1d,f,h,j) than in Cologne-Bonn and their distribution is narrower. The indices identify between 4 (UTCI) and 6 events (EHF), with considerable differences in their temporal evolution. All indices agree on the strongest event at the end of June 2019, with     HWMId showing the largest intensity by far. This day exhibits the highest daily peak in the station record for 2-metre temperature and all indices. The duration of this event ranges from 5 days for HWMId and 13-14 days for the other indices. On July 12, another hot dry peak is captured as 'moderate' by HWMId and 'extreme' by UTCI, but it is not exceptional for WBGT and EHF given the lower humidity and recent hot weeks, respectively. The index values remain positive for the majority of the period until mid-August (end of August for WBGT). However, only WBGT reaches the 'moderate' category again after mid-July. The high temperature percentiles between end of August and mid-September are also found in HWMId, but do not count for the magnitude as absolute temperatures are below the T 25p threshold.
To learn more about the behaviour of the four indices, the influence of the input parameters on the index values is calculated via multi-linear regression (see also  Table S1). By construction, the variation of HWMId can be explained completely by T max (R 2 = 1). EHF depends 4-5 times stronger on the three-day mean than the 30-day mean. WBGT and UTCI show different sensitivity to changes in the input parameters. For all four indices, temperature has the largest and radiation the smallest fraction of influence on the index values. Humidity takes second place for WBGT, the influence is nearly as high as T max , while for UTCI it is wind speed. The five-category scheme of Figure 1 applied to all input parameters can be found in Figure S2.

| LARGE-SCALE EUROPEAN HEAT WAVES
Towards an intercomparison of the four indices, the spatial distribution of the maximum duration and cumulative intensities at grid points for the entire study period  is displayed in Figure 2. The longest heat wave duration is reached in Western Russia with 42, 59, 50, 46 days for HWMId, EHF, WBGT and UTCI, respectively, which corresponds to the 2010 event (Barriopedro et al., 2011). The spatial median is 16 days for HWMId, but longer for EHF (29 days), WBGT (20 days) and UTCI (19 days). The highest local cumulative intensities are 31 (EHF), 36 (WBGT), 46 (UTCI), 52 (HWMId). The spatial mean maxima are between 10 and 13. Figures S3 and S4 indicate the calendar year when the maximum occurred, thus enabling the assignment of the maximum duration and intensity to known European heat waves. For example, the 2010 Russian heat wave, the 2003 western European and the 2018 northern European event are frequently found in both intensity and duration charts with the 2010 event standing out in the maximum duration and intensity (Figure 2). Table 2 includes the 20 strongest heat wave events by large-scale intensity, identified using the four indices. Generally, HWMId shows the lowest duration, area and grid point/large-scale intensity of heat waves, while other indices yield around twice as many days. WBGT and UTCI display double the large-scale intensity and nearly double the maximum area compared to HWMId. The percentiles of the input parameters, as derived from all grid points and days of the 20 strongest heat waves, have also been compared between indices. For example, all grid points identified during the 20 strongest heat wave events reach a mean temperature percentile of 98.7 and radiation percentile of 96.1 for UTCI. The wind percentiles are most often in the range of 40. Further differences arise because EHF relates to three-day means. The interquartile ranges in square brackets give information on the most robust metrics and their differences between the events and the indices. The interquartile ranges of the percentiles are comparatively narrow for the indices, that is, the case-to-case variability in the influence of the parameters is relatively low. The ranges are lowest for UTCI and most of the time lower than the differences between the indices. The interquartile range in temperature percentiles is small for all indices, while the largest variations are found for wind, with small differences between UTCI and WBGT in general. The variation of the mean or maximum area is also lower than the mean difference of, for example, the 20 HWMId and the 20 UTCI events. Thus, the result is not biased towards the largest events. The dominating metric for a high cumulative intensity can also be identified by calculating the ratios of cumulative intensity to duration or number of grid points. The cumulative intensity of the other indices was also compared, using only areas and grid points with heat wave conditions detected by the HWMId. That way, duration and spatial extent are constrained. This implies that for all heat wave grid points identified by EHF, WBGT and UTCI, only grid points that are also identified by HWMId (all grid points ≥ 0 in HWMId) are accumulated for heat wave metrics, keeping only one degree of freedom for cumulative intensity. This shows that differences in the indices are related to their mathematical definitions. Looking at the metrics in Table 2 for the 20 heat waves individually, the main influence factors for a high cumulative intensity besides the percentiles can be identified: For example, a long duration and a moderate magnitude or a shorter duration and a large area.
The 20 strongest events are visualized in bubble plots, ranked using large-scale intensity. The most striking feature of Figure 3 is that the 2010 Russian heat wave exceeds by far the duration and large-scale intensity of all European heat waves, being thus located at the top right corner of the diagram for all four indices. This does not hold for the maximum areal extent, for which the 2018 heat wave has comparable size for EHF, WBGT, and UTCI. HWMId shows the largest distinction between the 2010 heat wave and all other events.
The 2014 heat wave affected primarily Scandinavia, but its footprint is not seen completely in Figures S3 and S4 since the 2018 heat wave was more pronounced in most of the area. Interestingly, the 2003 heat wave ranks both third in HWMId and EHF for intensity but fifth in WBGT and tenth in UTCI. This is explainable by (i) different lengths of the uninterrupted series that fulfil the heat wave criteria, (ii) the lengths and intensities of the other events identified by the respective index and (iii) the input parameters. In general, the area is large and temperature percentiles are high, while wind percentiles are comparably low. The 2019 heat waves are amongst the 20 strongest in HWMId and EHF, but do not stand out in all metrics due to the fact that it consisted of two (June and July) events. In contrast to 2018, both 2019 heat waves were rather intense, but shortlived ( Figure 1). In 2007, 2012 and 2017, strong heat waves impacted Mediterranean and Balkan countries. In 2015, several heat waves originated from and persisted on the Iberian Peninsula. Note, that heat waves in the Mediterranean might be 'underestimated' in their spatial extent since ocean grid cells are not considered. Figure 4 facilitates the comparison of heat wave metrics among indices for the five strongest European heat waves identified by HWMId. The 2010 heat wave has a lower large-scale intensity in HWMId and EHF compared to the other two indices due to differences in duration and affected area. The 2003 heat wave has a much longer duration in EHF than in all other indices, which is continuously captured from July to August. The 2014 heat wave shows a relatively large separation in duration between WBGT/UTCI and the other two indices. WBGT and UTCI exceed thresholds earlier and longer than HWMId, and EHF shows a later onset. For the 2007 heat wave, almost all bubbles overlap and are very similar in size. The 2001 event differs again in terms of duration which is identified differently by the indices and spatial criteria.
Our results clearly show that temperature is the main driving factor for all four indices. The explained variance from multi-linear regression can be taken from Table S1 (for the stations). Humidity explains more of the variance of WBGT than UTCI and UTCI shows higher wind percentiles during the events. Physically plausible changes in radiation barely have any impact. Also in Figure S5 (for the large scale), WBGT is more sensitive to vapour pressure than UTCI, which is why wind exerts a comparatively larger influence in UTCI. Similarly to the two stations considered, the indices are barely sensitive for physically realistic radiation changes.

Note:
The interquartile ranges of the percentiles and the other metrics are indicated in square brackets.

F I G U R E 3
The 20 strongest heat waves by cumulative intensity (logarithmic y-axis, digits are multiples of power of 10) for all events , duration (x-axis) and area (size of the bubbles) F I G U R E 4 The five strongest heat waves compared, by cumulative intensity (y-axis) for all years , duration (x-axis) and area (size of the bubbles). The year bubbles on the right side are sorted for the highest cumulative intensity of HWMId in descending order. For each year, the ranking of the sizes for all indices are also indicated

| CONCLUSIONS AND OUTLOOK
We analysed historical European heat waves between 1979 to 2019 with different indices and a multi-component Lagrangian metric (regarding intensity, duration and areal extent) to assess their large-scale characteristics in a comparable form. The four heat wave indices HWMId, EHF, UTCI, and WBGT, use different meteorological input variables. To warrant comparability, they were normalized and the impact of the input variables was assessed using a multi-linear regression analysis. The comparison for the record-breaking heat wave in 2019 at Cologne-Bonn and Montpellier revealed that while our approach made the indices more comparable, the differences could be largely explained by the impact of the meteorological input parameters. Temperature does explain the largest fraction of the variance, and radiation the smallest. For WBGT, humidity is nearly as important as temperature while UTCI is more sensible to wind speed.
The three metrics (intensity, duration, spatial extent) were combined by introducing a cumulative intensity measure, and European heat waves 1979-2019 were ranked and visualized in a single diagram (bubble plot).
The 2010 Russian heat wave is by far the most extreme event in duration and large-scale intensity in all indices. The 2018 heat wave was comparable in size to the 2010 event for EHF, WBGT and UTCI. Interestingly, the 2003 central European heat wave was only the fifth and tenth strongest in cumulative intensity in WBGT and UTCI, respectively. The June and July 2019 heat waves were very intense, but short-lived, thus not belonging to the top heat waves. In terms of areal extent, the present method underestimates heat waves in areas with marginal seas since grid points over water are omitted.
While the normalisation generally allows to compare the four indices, differences still arise due to not entirely consistent definitions regarding the heat wave definition. Specifically, HWMId events cover smaller areas but capture intense temperature and dryness peaks. Thus, the 2010 Russian heat wave is more distinct in HWMId from the other heat waves than in the three other indices. EHF events are more spread out in time and space (3-day means), thus more affected by, for example, warm and humid nights and with stronger dependence on seasonality. Moreover, WBGT and UTCI capture temperature and radiation peaks, while WBGT is more sensitive to humidity, and UTCI to wind. In general, the percentile approach and the variation of metrics between events and indices provide important information on the dominant factor leading to high cumulative intensity and on the robustness of the metric.
To present, only few other studies had considered a heat wave large-scale perspective which considered tracking and cumulative intensity as presented here (Lyon et al., 2019;S anchez-Benítez et al., 2019). Lo et al. (2021) recently demonstrated a worldwide application of a Depth-First Search Algorithm for heat wave identification. With these methods, trends in intensity, spatial extent and duration can be calculated and results can be compared between studies and models, for example, the newly available CMIP6 projections. As the combination of duration and intensity over large areas are responsible for the most severe health and economic impacts, interdisciplinary research (e.g., links to health effects) is required to better quantify the impacts of heat waves in a warming climate.