Nonstationarity in High and Low‐Temperature Extremes: Insights From a Global Observational Data Set by Merging Extreme‐Value Methods

We merge classical extreme value methods to extract high (high temperatures (HT)) and low (low temperatures (LT)) temperatures and form time series having at least one extreme value per year. Observed daily maximum and minimum temperature records are used from 4,797 quality‐controlled, global, surface stations over 1970–2019. We assess changes in the magnitude and frequency of extreme temperatures by introducing and applying novel methods that exploit the definition of stationarity. Analysis shows significant increasing (40.6% of the stations) and decreasing (41.1%) trends in the frequency of high and LT, respectively, and increasing trends in both high‐ and low‐temperature values (35.6% and 49.7%). Globally, HT and LT frequencies are increasing and decreasing, respectively, by 0.9% and 1.1% per year, relative to the expected frequencies under the assumption of stationarity. The global mean annual HT and LT magnitudes are increasing by 0.004 and 0.016°C/year compared to the expected ones under stationarity. The results indicate that the assumption of stationarity fails to explain the observed changes. The proposed methods are an alternative approach to classical extreme value methods and a useful tool to reveal changes in extremes in the era of earth‐system change.

10.1029/2023EF003506 2 of 16 For example, a low threshold increases the number of values in the POT series but compromises the independence of data and affects trend estimates (e.g., Markonis et al., 2019).A universally accepted threshold-selection method does not exist (Durocher et al., 2018) and several techniques have been suggested in the literature (e.g., Deidda, 2010;DuMouchel, 1983;Dupuis, 1999;Ferreira et al., 2003;Liang et al., 2019;X. Pan et al., 2022;Schneider et al., 2021;Solari & Losada, 2012;Thompson et al., 2009).Many studies used the so-called Expert Team on Climate Change Detection Monitoring and Indices (ETCCDI; WMO et al., 2009) to estimate extreme temperature trends (e.g., Alexander et al., 2006;Rajulapati et al., 2022;P. Zhang et al., 2019).ETCCDI include duration and range indices, absolute indices (similar to AM), and percentile and threshold indices (similar to POT).The latter use a universal threshold and may not be representative on a global scale.
Here, to obtain the extreme temperature time series, we combine notions from AM and POT.We set the series of extremes as the values larger than or equal to the minimum of the AM series as HT.Similarly, we consider the values lower than or equal to the maximum of the annual minima as the LT.This approach guarantees that the formed HT-LT series has at least one extreme value per year (H1, L1).This concept was briefly mentioned in a short communication by Langbein (1949) in the context of floods but has yet to be applied massively to global climatological data sets.With some exceptions (e.g., Papalexiou et al., 2018), most of the global extreme temperature trend assessments have used gridded or reanalysis data.Here we apply the approach to 4,797 quality-controlled raw station observations from a global data set of maximum and minimum temperatures over 1970-2019.We further employ novel assessments to estimate global magnitude and frequency trends for data sets with a variable number of stations and records per year.Results are discussed and compared with AM and POT outcomes for various thresholds.

Data
We use raw station observations of maximum and minimum daily surface temperature from a global data set (Tang et al., 2021a) which was used by Tang et al. (2021b) as input to construct a Station-Based Serially Complete Earth data set (SC-Earth) from 1950 to 2019 (see also Tang et al., 2021cTang et al., , 2022)).Tang et al. (2021aTang et al. ( , 2021b) ) merged raw observational data from the biggest ground-based observational data sets, that is, the Global Historical Climate Network Daily (GHCN-D; Menne et al., 2012) and the Global Surface Summary of the Day (GSOD; National Oceanic and Atmospheric Administration (NOAA) (2021)).Both maximum and minimum temperatures reach quasi-global coverage from the 1950s until 2019.The data set comprises about 35,000 stations.
Observed data have missing values and extracting extreme series from such records can be problematic.Thus, here we select stations having at least five complete years per decade, with a complete year defined as one having less than 10% of missing values; similar criteria have been applied in previous studies (e.g., Vose et al., 2005).This screening leads to a limited number of stations (2,339), therefore, we restrict the period to 1970-2019, when warming has been reported to accelerate (e.g., Sarkar & Maity, 2021;P. Zhang et al., 2019).The resulting subset comprises 4,797 stations (most of which) having at least 40 years of data (Figure 1c) uniformly distributed with time (Figure 1b), and mainly located in the USA, and Europe (Figure 1a).

Methods
We assess changes in the magnitude and frequency of high and low daily temperatures.First, we use daily maximum and minimum temperature time series from the surface stations data set to define the series of extremes, and then we assess changes in magnitude and frequency at the station level, regionally in 2° × 2° grids, and globally.

Defining the Extreme Temperature Time Series
With the goal of including extremes from every year, we merge notions from the AM and POT methods to extract the extremes series.We define HT as all values greater than or equal to the minimum of the annual maxima series.Thus, the time series of HT formed will have at least one value per year; we abbreviate this series as H1.Of course, this approach can be generalized to form, for example, H2 or H3 series comprising at least two or three values per year.Accordingly, we form series of LT comprising values lower than or equal to the maximum of the annual minima (abbreviated as L1).
Specifically, to form the H1 series (Figure 2a): (a) extract the annual maxima of daily maximum temperature (b) estimate their minimum, and (c) select the values greater than or equal to the previous minimum.Accordingly, for 10.1029/2023EF003506 3 of 16 the L1 series (Figure 2c): (a) extract the annual minima of daily minimum temperature, (b) estimate their maximum, and (c) select the values lower than or equal to the previous maximum.To assess changes in the frequency of daily maximum and minimum temperature extremes we form occurrence series by counting the number of extremes per year in the H1 and L1 series (Figures 2b and 2d).The time series of H1 and L1 annual occurrences are denoted as NH1 and NL1, respectively.

Trend Analysis for Frequency and Magnitude
Next, n denotes the number of extreme occurrences (either maximum or minimum); s = 1,…,4,797 and y = 1,…,50 refer to a station s and a year y, and Y s is the total number of years with available data in station s.
To estimate the change in high and low-temperature frequency, we extend and modify a method previously applied to precipitation.Specifically, Papalexiou and Montanari (2019), in a time series of N years, considered the N largest values in each station and showed that under the assumption of stationarity, on average, one extreme per year is expected.They then estimated the change of the mean observed extremes per year relative to the expected (one) extreme per year.Here, the expected number of extremes per year is higher than one and varies among stations.Thus, for the trend analysis of each individual station, we assess the ratio of the observed to expected NH1 (or NL1) per year which fluctuates around 1. We then estimate the linear regression (LR) slope of this ratio versus years for each station.Specifically, the expected number of extremes per year for each station s is   = ∕ where n s is the total number of extreme occurrences in station s for all years.The frequency ratio per year is   = ∕| where n s|y is the observed number of extremes in year y for station s.The ratio should be constant (slope equal to zero) under stationarity.A t-test is performed to evaluate the slope's significance at a 5% significance level (or 95% confidence level).The spatial distribution of the stations' trends is graphically depicted considering the average frequency rate (slope) of the stations located inside 2° × 2° grid cells.
The single-station approach for the estimation of frequency changes can be generalized to assess a global extreme temperature trend using all stations' series.Specifically, to examine the global change in the number of extreme events relative to the expected number of extremes per year: 4. Estimate the frequency ratio   = ∕ for every year which fluctuates around 1. 5. Plot f y versus y and estimate the LR slope of this time series (referred to as the Global Frequency (GF) ratio).
Under stationarity, this slope should be zero.
We use a similar rationale for the magnitude slopes (rates), but in this case, we estimate the difference between the mean annual extreme temperature and the expected extreme annual temperature.′ | versus years should be zero under stationarity.Again, we perform a 95% confidence level t-test for the slopes' significance and the spatial distribution of the stations' trends is graphically depicted considering the average magnitude rate (slope) of the stations located inside 2° × 2° grid cells.
The single-station approach for the estimation of magnitude changes can be again generalized to estimate a single Global Magnitude (GM) trend; we simply average the ′ | of all 4,797 stations every year to obtain the mean The mean annual difference d y fluctuates around zero (based on the law of large numbers) and when plotted versus the years, the LR slope is expected to be zero under stationarity.This plot will be referred to as the GM difference (or GM).
A non-parametric slope estimator is also applied to validate the LR slopes and significance.A typically used approach for the assessment of trends is the Theil-Sen estimator (Sen, 1968;Theil, 1950) which computes slopes of lines crossing all possible pairs of points.The median of these slopes is the slope estimator.Here we apply the Siegel repeated medians estimator (A.F. Siegel, 1982).In this, the slopes between each point and every other point are calculated (resulting in n − 1 slopes) and the median is taken.This results in n medians and the slope estimator is the median of these medians.To evaluate the significance of the estimated trends, we use the non-parametric Wilcoxon test (Wilcoxon, 1945), popularized by S. Siegel (1956).

Extreme Temperature Series of Annual Maxima (Minima) and Peaks Over (Below) Threshold
We use AM as an abbreviation of both Annual Maxima and Minima, and similarly, POT abbreviates both Peaks "Over Threshold" and "Below Threshold".Trend assessments described in Section 3.2 for L1-H1 are also applied to the extreme series of AM and POT for the same subset.The results from all methods are compared.To estimate the AM extreme temperature series, we simply consider the maximum (minimum) annual values of each station (1 extreme/year).For POT, a commonly used threshold is the 90th percentile for HT (DuMouchel, 1983) and, accordingly, the 10th percentile for LT.We then select the values over (below) the selected percentile as the HT (LT) series.We denote the approach as POT10p (POT 10%) for both HT and LT; the abbreviation refers to the upper or lower 10% of the series that we consider.We note that for most stations, the L1 threshold (i.e., the minimum value of each L1 sample) is lower than the 10th percentile and the H1 threshold (i.e., the maximum value of each H1 sample) is higher than the 90th percentile (Figures S1 and S2 in Supporting Information S1), meaning that the L1-H1 sample sizes are almost always smaller than the POT10p ones.Additionally, we examine as thresholds for HT the 95th percentile (POT5p), which is numerically close to the mean percentile (upper 4.18th percentile) considered in H1 (Figure S1 in Supporting Information S1), and, finally, the 99th (POT1p) percentile (accordingly, the 5th and 1st percentile for LT).

Further Applications of H1-L1 and Comparisons With Peaks Over Threshold
Since the presented approach selects a threshold to define the extreme series, it has applications in the statistical analysis of extreme events, similar to POT.It can be used to determine the threshold for fitting the Generalized Pareto (GP) in the stationary case or the non-stationary GP in the non-stationary case.To examine the potential of the method in the extreme value analysis, we compare its performance with POT5p for the stationary extreme series.The POT5p is chosen since the average global threshold determined by H1 (upper 4.18th percentile) is numerically closer to the upper 5th percentile compared to the other tested percentiles.We also estimate the return levels of HT for each one of the two approaches.We use the Maximum Likelihood Estimation (Fisher, 1922) to fit the GP and the Akaike Information Criterion (AIC; Akaike, 1974) to assess the goodness of fit.
Further, we examine the potential of the proposed method as a means for selecting the threshold for the GP.We consider a graphical method used for threshold selection, that is, the Mean Excess Plot (MEP) (e.g., Beguería & Vicente-Serrano, 2006;Benktander & Segerdahl, 1960;Chukwudum et al., 2019;Lana et al., 2019;Roth et al., 2016).In this, the mean excess function of the ordered data against the ordered data is plotted (Nerantzaki & Papalexiou, 2019).The appropriate threshold is the abscissa right of which the mean excess values stabilize or exhibit a relatively constant behavior.Above this threshold, the assumption of the GP can be considered valid, and the tail distribution can be estimated reliably.We present examples illustrating how the thresholds defined by H1 and POT5p compare to the thresholds suggested by MEPs.

High-Temperature Frequency and Magnitude
Changes in the H1 frequency reported here refer to the relative change of the annual H1 occurrences (NH1) compared to the expected annual H1 occurrences.About 75% of the data set's stations exhibit increasing rates of 10.1029/2023EF003506 6 of 16 H1 frequency (Figure 3a), with 40.6% being significant (34.9% non-significant).Only 6.4% of the stations have statistically significant decreasing trends (Figure S3 in Supporting Information S1).The H1 frequency (Figure 3a) is increasing for most of the world, except for a land strip with significantly decreasing rates, extending from the north-central to the southeast of the USA (Figure S3a in Supporting Information S1).This phenomenon of "cooling" in the eastern USA has been termed a "warming hole" (Z.Pan et al., 2004).Z. Pan et al. (2004) found that it is associated with changes in low-level circulations leading to the replenishment of seasonally depleted soil moisture and increasing late-summer evapotranspiration, suppressing daytime maximum temperatures.Long-term phase changes in the North Atlantic Oscillation also contribute significantly to this cooling (for the northeast and southern USA), along with the Pacific Decadal Oscillation (for the southern USA) (Mascioli et al., 2017).The whole of Europe has significant increasing trends, and so has southeast Australia and the western USA.The GF ratio slope is 0.009 (Figure 3b) that is, the H1 frequency has been increasing by 0.9% per year during 1970-2019.The regression is significant at the 95% confidence level (b = 0.009, R 2 = 0.35, F(1, 48) = 25.69,p < 0.000), rejecting the hypothesis of stationarity.Considering that the mean of the annual H1 occurrences is 16.7 extremes/ year, the rate indicates that the H1 occurrences have increased by 7.5 H1 events by the end of the 50-year period.Specifically, the last decade of the study period experienced 22% more H1 occurrences compared to the expected annual frequency, while during the first decade, the H1 occurrences were 18% less than expected (Figure 3b).
The magnitude trends express the difference between the mean annual observed and expected H1 temperature.Significant increasing rates of H1 magnitude are observed for more than one-third of the stations (35.6%) during the study period, and 13.0% of the stations exhibit significant decreasing rates (Figure 4a).A similar spatial pattern exists between the significantly increasing rates of H1 frequency and magnitude, found over most of Europe, southeast Australia, and the western USA (Figures S3a and S4a in Supporting Information S1).Previously, Brown et al. (2008) also identified Europe as one of the hotspots of HT change, with a mean warming of 1°C, over 1950-2004.Here, we estimate an increase of 0.7°C for Europe during 1970-2019 (an average increase of 0.014°C/year, Figure 4a).The eastern USA exhibits significant decreasing trends in H1 magnitude, following the patterns of H1 frequency (Figures S3b and S4b in Supporting Information S1) and again demonstrating the warming hole in the USA.Southwest China has a cluster of significantly decreasing H1 frequency change rates (Figure S4b in Supporting Information S1).Zhao et al. (2020) also detected decreases in the daytime hot extreme, warm days, and warm spell duration in this region, which they attributed to past changes in the local anthropogenic aerosol concentrations.The slope of the GM difference is b = 0.004 (statistically significant regression with R 2 = 0.16, F(1, 48) = 8.99, p = 0.004) and indicates that the global temperature increases by 0.004°C per year (Figure 4b).Specifically, the first and last decades of the 1970-2019 period have experienced a mean difference of −0.09°C and +0.09°C compared to the expected annual H1 magnitude.

Low-Temperature Frequency and Magnitude
The mean annual L1 frequency ratio is decreasing for most stations globally (Figure 5a).Some significant increasing trends are found on the southeast coast of Australia, but they account for only 2.0% of the stations (Figure S5a in Supporting Information S1).Significant decreasing rates characterize 41.1% of the stations and occupy most of the USA, Europe, and the eastern coasts of Asia and Australia (Figure S5b in Supporting Information S1).Additionally, almost half the stations (47.9%) have decreasing rates, although not significant.In their review, Seneviratne et al. (2012) also found that most regions in the world tend to experience fewer cold days and nights.The GF graph (Figure 5b) indicates a decrease in the GF of L1 events with a slope b = −0.0105and a statistically significant regression at the 95% confidence level (R 2 = 0.35, F(1, 48) = 25.49,p < 0.000) again rejecting the hypothesis of stationarity.Since on average the L1 series considers 13.9 extremes/year, the decrease in the GF graph corresponds to a decrease of 7.3 L1 events during the 50 last years.During the first and the last decade of the study period, the L1 occurrences increased by 25% and decreased by 13% compared to the expected occurrences, respectively.The global H1 and L1 rates of change are equivalent (similar absolute values) but with opposite signals.
Most of the stations' L1 magnitude change rates are increasing (83.3%) with 49.7% being significant.The increasing L1 values translate into a decrease in the extremity of cold temperatures, which tend to become hotter with time.The percentage of stations with significant increases in their low-temperature magnitude (49.7%) is higher than the one of stations with significant increases/decreases in their high and low frequencies respectively (41.1% and 40.6%), and even higher than the percentage of stations with significant increases in high-temperature magnitude (35.6%).Only 4.1% of the stations have significantly decreasing L1 magnitude trends; these are mainly clusters in central-east Europe and at the borders of Kazakhstan and Russia (Figure 6a, Figure S6b in Supporting Information S1).We observe that the warming hole of the USA is not expressed through LT magnitude and frequency trends, which is in accordance with P. Zhang et al. (2019) (Figures 3 and 4 of their publication for magnitude and frequency, respectively).The trend of the GM difference (Figure 6b) is positive (b = 0.016) with  S1 in Supporting Information S1) for magnitude or frequency.The trends estimated by the two methods for the GF ratio and the GM difference graphs are also identical (Table S1 in Supporting Information S1).We additionally compare the percentage of significantly increasing and decreasing rates estimated by the two methods.The TSS estimates similar percentages of significant increasing and decreasing (magnitude and frequency) rates for a 99.9% confidence level as the LR for a 95% confidence level.These results suggest that TSS is a more robust approach compared to the LR but also reinforce the LR estimates indicating that the estimated trends are consistent across different statistical approaches.

Comparison With Annual Maxima (Minima) and Peaks Over (Below) Threshold
The number of HT events per year is, on average, 16.7, 39, 19.7, and 3.9 for H1, POT10p, POT5p, and POT1p series, respectively.The corresponding mean number of LT events is 13.9, 38.6, 19.4, and 3.8.The tested methods (AM, POT for various thresholds, and H1-L1) present similar spatial distributions (Figures S13a-S26a in Supporting Information S1) of significant frequency and magnitude rates (e.g., Figures S7 and S8 in Supporting Information S1 for the significant rates in Annual Maxima and Minima).POT1p's spatial patterns slightly deviate from the rest, due to the unrepresentative extreme series defined by the high threshold (Figures S23a-S26a in Supporting Information S1).
The significance of the estimated rates is affected by the sample size.Therefore, we assess the number (percentage) of stations with statistically significant rates for each method over a range of confidence levels (90%, 95%, 99%, 99.5%, 99.9%; Figure S27 in Supporting Information S1).As expected, as the threshold increases for HT (decreases for LT) the significance of the non-zero rates decreases (Figure S27 in Supporting Information S1).As a result, there are fewer stations with statistically significant high-temperature rates as the threshold increases (with low-temperature rates as the threshold decreases).
All methods agree that the number of stations with significant increases in their LT magnitude is higher than the one of stations with significant increases or decreases in their HT or LT frequencies respectively, and even higher than the number of stations with significant increases in HT magnitude (Figure S27 in Supporting Information S1, Table 1).The percentage of statistically significant stations of POT10p (for both frequency and magni tude) is higher compared to the rest of the methods, owing to the larger extreme value series that the method considers Note.For each variable, we show the percentages of stations with significant rates (for 95% and 99.9% Confidence Levels; CL), the mean rates of change (estimated as the global mean of the average rate within each 2° × 2° grid cell), and the Global Magnitude or Frequency Rate (estimated by the global graphs of magnitude difference and frequency ratio).

Table 1 Main Results From the Comparison Among the Presented Method (H1-L1), Annual Maxima and Minima (AM), and Peaks
Over Threshold Using Several Percentiles (99th,95th,and 90th Corresponding to POT1p,POT5p,and POT10p) (Figure S27 in Supporting Information S1, Table 1).Although the HT-LT sample is larger in POT1p compared to AM, the former has systematically lower percentages of stations with significant rates.This shows that changes in the exceedances over a very high threshold (or in the values lower than a very low threshold) are less significant than the ones of annual maxima (annual minima).L1-H1 and POT5p have similar percentages of significant rates in most cases (Table 1) since the average percentile considered in L1 (4.33th percentile) and H1 (95.82th percentile) is numerically close to the 5th and 95th percentile, respectively (Figure S1 in Supporting Information S1).
The GF graphs, which include values from all stations, whether they are statistically significant or not (as explained in Section 3.3), reveal that the rate of change in frequency becomes more pronounced as the threshold increases for HT or decreases for LT (Figure S28 in Supporting Information S1 and Table 1).To clarify, as the sample sizes of extreme data series increase or, equivalently, as the threshold for HT decreases and the threshold for LT increases, the absolute values of GF rates decrease.This suggests that the more extreme the HT or LT magnitudes are, the higher the increase or decrease in their frequency rate, respectively, over time.Similar conclusions are drawn by the mean rates of frequency change (estimated as the global mean of the average rate within each 2° × 2° grid cell, Table 1).Additionally, we note that the absolute values of GF rates for HT and LT are numerically close to each other for each independent method, but with opposite signs (Table 1).
The GM rates (Table 1) are steeper for Annual Maxima and Minima (0.014 and 0.055°C/year for HT and LT, respectively) compared to H1-L1 (0.004 and 0.016°C/year), POT10p (0.004 and 0.015°C/year), POT5p (0.003 and 0.015°C/year), and POT1p (0.000 and 0.011°C/year) (Table 1).Previous studies agree with this AM outcome and have detected increases of 0.20°C and 0.14°C per decade in LT and HT using the climate anomaly method (Vose et al., 2005) and 0.19°C per decade using Annual Maxima (Papalexiou et al., 2018) after 1950.The significance of the GM rates for AM is also higher compared to the other methods, especially for LT.Clearly, the L1-H1 series show moderate increases compared to previous studies (e.g., AM) due to the different definitions of the extreme time series.The rate of increase for the AM temperatures is higher than the rate of increase for the threshold-based approaches; a detailed explanation for these discrepancies is given in Section 4.4.The GM trends are almost identical for POT10p, POT5p, and L1-H1 (Table 1, Figure S28 in Supporting Information S1).The POT1p has a lower (and non-significant) GM rate for HT compared to the rest of the methods; this demonstrates that the magnitude of the values over a very high threshold does not change as significantly as the magnitude of Annual Maxima or as the peaks over a lower threshold.

Differences With Annual Maxima
Annual Maxima (Minima) of HT (LT) increase (decrease) faster than the extreme temperatures higher (lower) than a threshold (Section 4.3).In H1 (L1), each year considers the AM values (by definition) plus some additional values that lie between the lowest and the highest AM values.The distribution of these intermediate values in time defines the H1 (L1) rates and varies depending on the change in the H1 (L1) frequency relative to the change in magnitude.
Based on the spatial distribution of the frequency and magnitude rates (Figures 3a and 4a), we distinguish mainly three combinations of frequency and magnitude rates for HT: (a) increasing frequency and increasing magnitude rates (66.3% of the stations), (b) decreasing frequency and decreasing magnitude rates (20.5% of the stations), and (c) increasing frequency rates and decreasing magnitude rates (12.5% of the stations).Only a few stations (0.6%) have a combination of decreasing frequency rates and increasing magnitude rates of H1.
For case (a), the AM rates are typically higher than the H1 ones (for 95.2% of the stations).The common case is that the increasing H1 frequency translates to more "intermediate values" at the end of the study period compared to the beginning, alleviating the H1 slope value (see e.g., Figure S29 in Supporting Information S1).A similar effect exists for case (b), but now the clusters of intermediate values exist in the first half of the case study (Figure S30 in Supporting Information S1), thus again resulting in higher AM rates (absolute values) compared to H1 rates (for 100% of the stations in this category).The rarer case (c) exists when the magnitude of AM extremes decreases and their frequency increases (e.g., southeast China, Figures 3 and 4).In such cases, there may be absolute AM rates higher (83.4%) or lower (16.6%)than H1 magnitude rates, depending on the relative increase in frequency compared to the decrease in magnitude (e.g., Figures S31a and S31b in Supporting Information S1).
Overall, the three conditions described above contribute to 94% of the stations' AM magnitude rates being higher than the corresponding H1 rates, and thus the AM provides more pronounced magnitude rates compared to the H1 approach.Similar assessments can be deduced for the comparison of Annual Minima with the L1 series.

Differences With Peaks Over Threshold
In this global assessment of extreme temperature trends, the H1 method considers, on average, the upper 4.18th percentile, ranging from the upper 0.8th to the 40th percentile.The upper 5th percentile is a typical threshold considered in POT.Therefore, the two approaches are approximately equivalent in this global assessment.However, regionally, the introduced approach can deviate from the 95th percentile (Figure S1 for H1 and Figure S2 for L1 in Supporting Information S1).For example, the H1 series in the southeast USA and the L1 series around the Baltic Sea consider more values compared to POT5p (Figures S1 and S2 in Supporting Information S1).Conversely, the H1 trends in the northern USA and Canada and the L1 trends in the southeast USA can be estimated by fewer values compared to the POT5p approach.Next, we investigate how the different definitions of extremes based on the two methods affect the estimated trends.
We first examine H1 adopting a lower threshold compared to POT5p (e.g., the upper 15th percentile, Figure S32 in Supporting Information S1).The H1 magnitude shows a slightly negative slope (−0.0016°C/year) in this case.
When considering the upper 5th percentile (POT5p) for the same station, (a) seven years are excluded from the analysis (1995,1997,(1999)(2000)(2009)(2010)(2011)(2012), and (b) the slope becomes slightly positive.Similarly, applying POT1p, (a) 20 years are excluded from the analysis (with large periods of consecutive missing years), and (b) the slope becomes significantly positive.The cluster of extreme values during 2012-2014 has a pronounced impact, pulling the slope toward higher values as the threshold increases.Eventually, the higher threshold of POT1 captures only outlier trends.Similar differences are observed in the frequency trends of the station with the observed-to-expected frequency ratio increasing as the threshold rises (Figure S33 in Supporting Information S1).
We then examine the opposite case, when H1 adopts a higher threshold compared to POT5p, that is, the time series has low interannual variability in extremes (Figure S34 in Supporting Information S1).For this station, the H1 method adopts the upper 1st percentile to define the extreme series.In both cases (H1 and POT5p) the magnitude slope is marginally positive, but the slopes estimated by the two methods are slightly different (similarly for the frequency slope, Figure S35 in Supporting Information S1).It is obvious that the POT5p approach has needlessly included values that are not "rare" or extremes, affecting the estimated slopes.
In this analysis, for 75.2% of the stations, H1 selects a higher threshold than POT5p (a lower threshold for 24.8% of the stations).Similarly, L1 selects a lower threshold than POT5p for 80.4% of the stations (a higher threshold for 19.6% of the stations).Each approach yields different trends depending on the defined extreme series, with more noticeable distinctions in certain regions, particularly when H1 (or L1) employs a higher (or lower) threshold compared to POT5p (e.g., Scandinavia).

Applications of H1-L1 for the Statistical Modeling of Extremes
Based on the AIC, the GP is a better fit for H1 samples compared to POT5p samples for 75.2% of the stationary series (2,468 stationary series).This percentage aligns with the portion of stationary series for which the H1 threshold is higher than the upper 5th percentile used in POT5p (82.3% of the stations).For example, GP with H1 is a better fit for 98.5% of the cases where the selected threshold by H1 is higher than the upper 4th percentile while GP with POT5p is the best choice for 99.9% of the cases where the selected threshold by H1 is lower than the upper 6th percentile.Clearly, GP is fitted better to smaller samples of extreme values.We note that we tested several distributions to fit the extreme series, but GP consistently performed better, followed by the Generalized Extreme Value distribution (performed better for only 2% of the stations compared to GP).
We then estimate the return levels for each station from the GP distributions fitted using H1 and POT5p.Return levels of the GP distribution fitted using high thresholds are typically higher compared to the ones fitted using low thresholds.Therefore, following the thresholds' patterns, for 72% of the stations, the return levels using GP for H1 were higher compared to the ones estimated by GP for POT5p.For most of the stations, the differences between the estimated return levels using H1 and POT are minimal.When the absolute differences between the estimated return levels from H1 and POTp are over 0.5°C (635 stations), the H1 predicts higher return levels for 85.2% of the cases.Thus, the proposed method defines extreme series which estimate safer design values for extreme temperature compared to POT5p.
We, finally, examine the thresholds suggested by MEPs and compare them with the H1 threshold and the 95th percentile (POT5p).The MEP threshold is the abscissa left of which the plot is stabilized.We illustrate the juxtaposition of MEP, H1, and POT5p for two stations, one for which the selected H1 threshold is very high (Figure S36 in Supporting Information S1) and one very low (Figure S37 in Supporting Information S1).We notice that the H1 threshold agrees better with the threshold suggested by the MEP compared to the POT5p threshold (Figures S36 and S37 in Supporting Information S1).For the first case (a series for which H1 defines a threshold higher than the 95th percentile, i.e., 99.2th percentile) the POT5 threshold is unnecessarily small, being located in the middle of the stabilized area, while the H1 threshold is more sensible (Figure S36 in Supporting Information S1).For the second case (a series for which H1 defines a threshold lower than the 95th percentile, i.e., 71.8st percentile), the POT5 approach threshold is inside the "unstable" area, while the H1 threshold is exactly at the cut-off point between the stable and unstable regions (Figure S37 in Supporting Information S1).
A reason for the agreement of the MEPs with the H1 thresholds is that the stability needed in the MEP plot is guaranteed by the existence of extreme values for all years, satisfied by the H1 series.Although the statistical modeling of extremes is outside the scope of this paper, the use of the H1-L1 as an efficient way to automatically select a threshold for fitting the GP distribution is promising.

Discussion and Conclusions
Several recent papers used various approaches to estimate the change in the magnitude and frequency of extreme temperatures on global or regional scales.Most recent approaches for studying global extreme temperature trends involve the use of climate indices (e.g., La Sorte et al., 2021;P. Zhang et al., 2019), mainly the ETCCDI (Alexander et al., 2006;WMO et al., 2009).Others, such as Papalexiou et al. (2018) used classic statistical approaches (Annual Maxima) to define the extreme temperature series and their trends globally.A plethora of regional studies have recently surfaced aiming to assess extreme temperature trends mainly in Asia (Ahmad et al., 2023;Ding et al., 2021;Ma et al., 2022;Om et al., 2022;Wang et al., 2021;Yatim et al., 2019;Yosef et al., 2019;S. Zhang et al., 2022), but also Latin America (Meseguer-Ruiz et al., 2019;Ruiz-Alvarez et al., 2020;Sanches et al., 2023), Europe (Tošić et al., 2023) Antarctica (Wei et al., 2019), and New Zealand (Caloiero, 2017), among other regions.
Here we use an alternative method (H1-L1) for obtaining the extreme value series and assess global changes in extreme temperature.The method merges the AM and POT approaches, by defining the threshold for the extreme series as the minimum value of the Annual Maxima (or the maximum value of Annual Minima).Further, we discuss the differences in the extreme temperature trends estimated by the H1-L1, POT, and AM methods.
The motivation for this study was to assess extreme temperature trends of both frequency and magnitude globally, using a common framework for all stations.When assessing hydroclimatic trends at large scales, the extreme series should be defined universally.Setting a common (absolute or relative) threshold for all stations, as usually done in POT, does not guarantee a uniform assessment of trends among stations; for example, this threshold may only comprise a few years of the time series for some stations with clustered extremes.Clustered high-temperature extremes can occur when there are prolonged periods of HT, for example, due to natural climate patterns such as El Niño or La Niña, with durations of up to 2 years.Indeed, using the H1 series it is shown that a lower threshold (compared to POT5p) is needed in the areas around the Gulf of Mexico (in some cases the upper 15th percentile is considered, see Figure S1 in Supporting Information S1).In this context, the H1-L1 method acts as an index for the interannual variability of the time series of extremes, with high thresholds automatically selected for low-variability series, and, respectively, low thresholds for series with high variability, so that the latter is captured.Finally, the approach can simultaneously estimate both magnitude and frequency trends, which is not possible with AM.To summarize, based on the comparisons with POT and the use of Mean Excess Plots, the benefits of the proposed method include: 1.The analysis involves all available years, with every year having at least one extreme value, thus allowing to determine changes in the extreme values observed each year rather than in "clustered outliers", 2. The analysis does not include unnecessary values lower than the lowest of the annual maxima, 3. The temperature thresholds are automatically assigned to each series, therefore reducing the subjectivity in threshold selection.
We apply the method to LT and HT series from a subset of a global observational data set (4,797 stations) for the period 1970-2019, obtaining series with at least one extreme event per year (L1, H1).Exploiting the definition of stationarity, we estimate changes in the L1 and H1 magnitude and frequency and compare the results with the corresponding changes from AM (magnitude) and POT (frequency and magnitude).Overall, non-stationarity characterizes both the frequency and magnitude of extreme temperatures globally.The key results of this study are as follows: 1.During the last 50 years, the global stations have had predominantly increasing trends in the H1 frequency (40.6% significant) and magnitude (35.6% significant) and the L1 magnitude (49.7%, i.e., extremely cold temperatures tend to get warmer), and decreasing trends in the L1 frequency (41.1% significant).2. Spatially, the above-mentioned significant trends are distributed all over the world; the main area deviating from the patterns of the H1 trends, is a strip extending from the north-central to the east-central USA, owing to the "warming hole" phenomenon.3. The L1 series have experienced on average more intense changes compared to H1 in terms of magnitude (0.016°C/year and 0.004°C/year for LT and HT), while the global trends of L1 and H1 frequency have had similar relative changes (an increase of 0.9% per year and decrease of 1.0% per year for LT and HT). 4. All tested methods for the extraction of extreme series (L1-H1, AM, and POT with various thresholds) agree that there are more stations with statistically significant changes in LT magnitudes, compared to stations with significant changes in LT or HT frequency, and especially compared to stations with significant changes in HT magnitude.5.The change in the relative frequency of high (low) temperatures is more intensified as the threshold increases (decreases), that is, the higher (lower) the HT (LT) values the higher the rate of their frequency change.6.The Annual Maxima (and Minima) series have more intensified magnitude trends compared to the L1-H1 and POT magnitude series during the last 50 years, indicating that the annual maximum (minimum) temperature values increase faster than the values over (below) a threshold.7. Overall, the H1-L1 approach provides a better fit for the GP distribution and leads to higher return levels compared to POT using a fixed threshold (i.e., the 95th percentile).8.The agreement between the thresholds suggested by the Mean Excess Plots and the thresholds selected by H1 signifies that H1 may be a promising approach for automatic threshold selection.
The presented approach incorporates a simultaneous assessment of both frequency and magnitude trends for extreme temperature and can be applied to other variables, over either historical or future time series.However, caution is needed for its application to rainfall in arid environments (or other intermittent variables such as ephemeral flows); the existence of very low or zero rainfall values throughout a certain year may falsely lead to considering most of the series' values as extreme.The method is better suited for non-intermittent variables, such as temperature or non-intermittent flow, and can be generalized into other non-climatological variables.

Figure 1 .
Figure 1.(a) The global distribution of the number of stations, (b) Histogram of the number of stations per year (1970-2019), (c) Histogram of the number of stations versus their number of years.

Figure 2 .
Figure 2. Explanatory graph for the definition of the (a) high-temperature series (H1), (b) annual H1 occurrences (NH1), (c) low-temperature series (L1), and (d) annual L1 occurrences (NL1) for a station with maximum and minimum temperature time series of 50 years.

Figure 3 .
Figure 3. Changes in the frequency of high temperatures (H1) (annual ratio of observed to expected NH1).(a) Global distribution of the mean annual trends (average rate of change within each 2° × 2° grid cell), and (b) Annual Global Frequency ratio over the period 1970-2019.

Figure 4 .
Figure 4. Changes in the magnitude of high temperatures (H1) (difference between observed and expected H1).(a) Global distribution of the mean annual trends (° C/year, average rate of change within each 2° × 2° grid cell), and (b) Annual Global Magnitude difference over the period 1970-2019.

Figure 5 .
Figure 5. Changes in the frequency of low temperatures (L1) (annual ratio of observed to expected NL1).(a) Global distribution of the mean annual trends (average rate of change within each 2° × 2° grid cell), and (b) Annual Global Frequency ratio over the period 1970-2019.

Figure 6 .
Figure 6.Changes in the magnitude of low temperatures (L1) (difference between observed and expected L1).(a) Global distribution of the mean annual trends (° C/year, average rate of change within each 2° × 2° grid cell), and (b) Annual Global Magnitude difference over the period 1970-2019.
1. Estimate the expected number of extremes per year for each station s as