Coevolution of Extreme Sea Levels and Sea‐Level Rise Under Global Warming

Design of coastal defense structures like seawalls and breakwaters can no longer be based on stationarity assumption. In many parts of the world, an anticipated sea‐level rise (SLR) due to climate change will constitute present‐day extreme sea levels inappropriate for future coastal flood risk assessments since it will significantly increase their probability of occurrence. Here, we first show that global annual maxima sea levels (AMSLs) have been increasing in magnitude over the last decades, primarily due to a positive shift in mean sea level (MSL). Then, we apply non‐stationary extreme value theory to model the extremal behavior of sea levels with MSL as a covariate and quantify the evolution of AMSLs in the following decades using revised probabilistic sea‐level rise projections. Our analysis reveals that non‐stationary distributions exhibit distinct differences compared to simply considering stationary conditions with a change in location parameter equal to the amount of MSL rise. With the use of non‐stationary distributions, we show that by the year 2050 many locations will experience their present‐day 100‐yr return level as an event with return period less than 15 and 9 years under the moderate (RCP4.5) and high (RCP8.5) representative concentration pathways. Also, we find that by the end of this century almost all locations examined will encounter their current 100‐yr return level on an annual basis, even if CO2 concentration is kept at moderate levels (RCP4.5). Our assessment accounts for large uncertainty by incorporating ambiguities in both SLR projections and non‐stationary extreme value distribution parameters via a Monte Carlo simulation.

2 of 14 (Jackson & Jevrejeva, 2016;Kopp et al., 2014;Slangen et al., 2014;Wong et al., 2017). An elevated MSL results in several alarming situations, from permanent flooding of tropical atolls (Storlazzi et al., 2018) and degraded ecosystem resilience of wetlands (Sun et al., 2022), to erosion of coastlines (Vousdoukas, Ranasinghe, et al., 2020), severe economic losses associated with coastal infrastructure damage  and increased flood exposure of people and assets (Hinkel et al., 2014;Kulp & Strauss, 2019;. This means, with a higher MSL in the following decades, present-day ESLs are not representative of future coastal flood hazard and thus inadequate for resilient infrastructure design, irrespective of the future variability in storm and tide activities Vitousek et al., 2017). Hence, understanding future changes in the frequency of ESLs is essential for developing resilient coastal plans.
Several recent studies have assessed future changes in global/continental ESLs due to the impact of sea-level rise (SLR). Most of these studies have employed a process-based modeling approach, where physics-based model output is combined with statistical analysis. For example, hydrodynamic models can be forced with projected climate variables on top of sea-level scenarios to simulate ESLs that can be used with EVT (Muis et al., 2020;Vousdoukas et al., 2018), or historical model simulations in conjunction with the amplification factor (AF) (Hunter, 2012) plus SLR projections can be utilized to evaluate how frequently current ESLs will occur in the future (Vitousek et al., 2017). Yet, when model output is used, the uncertainty stemming from model parameterization and input forcing (Parrish et al., 2012;Pathiraja et al., 2018) is combined with large uncertainty in sea-level projections (Ranger et al., 2013). Since EVT is also applied, this combined uncertainty blends with uncertainty from the extreme value distribution parameters as well (Wahl et al., 2017). As a result, inference on future ESLs becomes indeterminate due to large cascading uncertainties. Additionally, when the AF is used, as in past studies that have relied merely on tide gauge observations (Buchanan et al., 2016(Buchanan et al., , 2017Rasmussen et al., 2018;Tebaldi et al., 2012), the relationship between MSL and extreme events is only implicitly established through an increase in the location parameter equal to a future SLR scenario, while the extreme value distribution that generates ESLs is not explicitly modeled as it is reliant on factors that evolve over time (Baldan et al., 2022;Ceres et al., 2017;Ragno et al., 2019).
In this study, we exploit GESLA-3 (Haigh et al., 2021), a newly published version of GESLA (Global Extreme Sea Level Analysis) which is a quasi-global tide gauge data set, in order to study the evolution of annual maxima sea levels (AMSLs) worldwide, under non-stationarity assumption. For the majority of tide gauges and years of record examined, maximum observed water level during a year is often associated with a lasting extreme weather system, for example, a storm, and thus associated with extreme sea states throughout storm duration. In this study, we thus use the terms AMSLs and ESLs interchangeably, although AMSLs might, in some instances when no extreme weather event occurred during a year, not reflect a true extreme sea state. It is to be noted, also, that not all ESLs result in coastal flooding and there exists a significant variability in the protective capacity of coastal defense infrastructure across the globe that yields different flooding frequency/severity patterns. However, AMSLs can still be good proxies for the expected frequency of coastal flooding in the years to come, assuming that no significant change/upgrade is implemented in coastal areas. Therefore, we first conduct a trend analysis of annual maxima still water levels and water levels reduced to MSL, only to test for significant positive trends in AMSLs. Our trend analysis is performed using the Mann-Kendall trend test, a well-established monotonic trend detection method, and is cross-checked by a relatively new technique based on quantile indicators that can capture strict stationarity of a time series. If positive trends are detected, we examine whether these can be statistically attributed to SLR. We then apply both stationary and non-stationary EVT (Bracken et al., 2018;Méndez et al., 2007;Razmi et al., 2017;Tramblay et al., 2013) to model the extremal behavior of sea without and with the MSL as a covariate, respectively. We test whether if ignoring covariability of distribution parameters with MSL results in a similar goodness-of-fit with that of the non-stationary distribution. Here, we show that non-stationary distributions better characterize the evolution of ESLs than solely considering frequency amplification due to a change in the location parameter that is equal to future SLR; we illustrate that the differences in future coastal flood hazard can be remarkable, depending on the approach used. In addition, we quantify the change in frequency of global ESLs in years 2050 and 2100 under moderate and high greenhouse gas concentration scenarios (Representative Concentration Pathways [RCP] 4.5 and 8.5, respectively) using non-stationary distributions. We also show the coevolution of ESLs and SLR in the coming decades for large populated coastal cities. Subsequently, we implement an uncertainty analysis to quantify the combined contribution of non-stationary extreme value distribution parameters and probabilistic sea-level projections in uncertainty estimates of non-stationary return

Sea-Level Annual Maxima and Projections
We retrieved sea level metadata for 5,119 tide gauges from GESLA database Version 3.0 (GESLA-3) and kept 375 gauges with: (a) reported data in more than 55 years, and (b) an overall record quality of "No obvious issues." Then, we downloaded observed still water level data for these 375 tide gauges and extracted the annual maxima, representative of ESLs. We favored this approach over a peaks-over-threshold sample (Coles et al., 2001), because the latter is subject to notable uncertainties associated with subjective choices of the threshold (u) parameter and the time window (lag) used to ensure independency between sea-level peaks (Coles et al., 2001). Different techniques should be best employed for the automatic selection of u (Caballero-Megido et al., 2018), while a single optimal lag is not realistic when dealing with multiple tide gauges across the globe, where depending on region and season, storms can last for a few days to more than a week due to persistent weather patterns (Bengtsson et al., 2009). It is important to mention, however, that block-maxima loses the value of data compared to peaks over threshold, in the sense that 1 year of plenty hourly tide gauge measurements is translated into a single data point. As a result, other extremes that might occur during a year are not taken into consideration, limiting the available data from which inference on trends and frequency of extremes is made. Despite this limitation, block-maxima sampling is a more straightforward option due to the fact that sampling does not involve any subjectivity, as is the case in other alternatives (e.g., peaks-over-threshold), and can still provide a good representation of extreme sea states for a location of interest. For a given year, if more than 20% of the GESLA data were comprised of "missing," "doubtful," "isolated spikes," or "wrong" values, we discarded the annual maximum value of that year. Subsequently, we required that for a tide gauge station more than 55 annual maxima exist, corresponding to more than 55 years of reliable yearly maximum values. The latter resulted in a final number of 204 tide gauges. These 204 tide gauges constitute a reasonable "pseudo-global" spatial distribution (Figure 1) which, despite its limitation, can provide meaningful insights for coastal flood hazard of many parts of the globe, considering the Overall distribution of tide gauges across the globe selected for trend analysis of annual maxima still water levels (top left panel). Positive and negative trends are colored with red and yellow, respectively. Trends significantly (95% confidence level) different than zero are shown with circle while others are displayed with square. Zoom-in subpanels are also shown for Europe, North America, Japan, and Australia. generality of sea level information (Woodworth & Blackman, 2004), that is, the fact that nearby coasts exhibit similarities in both MSL and extremes. However, our analysis is not representative of parts where no tide gauges were selected whatsoever, for example, Africa and Southeast Asia, among others. The reduced annual maxima for these tide gauges were computed by subtracting the MSL of the respective year.
For our analysis, we utilized sea-level projections at the tide-gauge level originating from Kopp et al. (2017), given as values relative (±) to the year 2000 CE baseline level. These revised sea-level projections are primarily derived from the same probabilistic framework adopted in Kopp et al. (2014), but with the substitution of expert-judgmentbased Antarctic ice-sheet (AIS) projections for an ensemble of physically modeled AIS projections (DeConto & Pollard, 2016). The latter, account for previously ignored physical processes like ice-shelf hydrofracturing and ice-cliff collapse, and thus are better at capturing the complicated temporal dynamics of AIS retreat than the simple assumptions inherent in many expert-assessment-based AIS melt predictions (Bamber & Aspinall, 2013;Little et al., 2013). Because of the fact that these sea-level projections are not considered well-constructed probability distributions but rather come as simulation frequency distributions , we modeled them with a Gaussian distribution. Particularly, for each tide gauge of interest, we obtained the respective normal distribution by minimizing the sum of squared error (SSE) of logarithmic probabilities of non-exceedance: where n is the number of given percentiles, p i is the probability of non-exceedance associated with the ith percentile, C is the normal cumulative distribution function, and q i is the quantile, that is, sea-level projection of the ith percentile. Since the sea-level projections come as nine distinct percentiles (0.01, 0.05, 0.167, 0.5, 0.833, 0.95, 0.99, 0.995, 0.999) including the 50th, that is, the median, we optimized Equation 1 with respect to only the standard deviation considering the mean of each Gaussian distribution to be known. It is apparent that the optimization scheme assigns bigger weights to upper quantiles since more values are known for percentiles >50th. The goal here is, however, to adopt a sensible probability distribution for sea-level projections which can be then used in a Monte Carlo experiment (see Section 2.3), rather than to obtain the best distribution to characterize the full range of possible values for the projected MSL. Regardless, future SLR remains greatly a topic of profound uncertainty (Kasperson, 2012), and best practice would be to follow a multiple-distribution method rather than relying on a single probability distribution (Heal & Millner, 2014).

Trend Testing
Trends in AMSLs were examined using the Mann-Kendall (MK) non-parametric test, and a stationarity test based on quantile indicators, that is, quantics (see Text S1 in Supporting Information S1 for more details on the latter). The MK test, which is based on the correlation between ranks of earlier and later values of a time sequence (Collaud Coen et al., 2020), can capture not only linear trends as a linear regression would, but also non-linear changes over time. The null (H 0 ) and alternative (H 1 ) hypotheses of the MK test assume that the data correspond to a sequence without and with a monotonic trend, respectively. The test statistic (S) is given by the following equation: where Z j and Z i are later and earlier values of the sequence respectively, n is the length of the sequence, and sgn is a function expressed as: BOUMIS ET AL. 10.1029/2023EF003649

of 14
S is approximately normally distributed when n ≥ 8 (Kendall, 1948;Mann, 1945), with a mean and variance as follows: where P i is the number of values in the ith tied group and k is the number of groups with tied ranks. The standardized test statistic (S std ) is given by: S std follows the standard normal distribution (S std ∼ Φ[0,1]) and H 0 of the two-sided Mann-Kendall test cannot be accepted if the absolute value of S std is greater than the theoretical value Φ 1-α/2 , with α being the significance level (5% for this study).

Frequency Models and Model Comparison
To estimate the frequency of AMSLs, we relied on EVT, that is, a statistical method that provides a theoretical framework to calculate the probability of occurrence of variables at extremely high/low events (Coles et al., 2001). Based on EVT, we modeled annual maxima sea levels with the generalized extreme value (GEV) distribution given by: where F denotes the cumulative distribution function, μ is the location parameter, σ is the scale parameter, and ξ is the shape parameter representative of the tail behavior: (a) ξ > 0 represents the heavy-tailed case (Fréchet type), (b) ξ < 0 refers to the short-tailed case (Weibull type), and (c) ξ ⇾ 0 identifies the light-tailed case (Gumbel type). The number of exceedances (N) of a sea level (z) in the case of Gumbel (ξ ⇾ 0), for example, can be expressed as: while for calculation of the AF, the new expected number of exceedances assuming a MSL rise of δ amount is then given by (Buchanan et al., 2017): Equation 9 implies that shifting the location parameter (μ) of a stationary Gumbel distribution by a SLR amount (δ), allows for modeling probabilities of annual maxima under a certain RCP scenario, that is, given the year and thus the magnitude of δ. The same applies to the case when ξ ≠ 0, that is, to Fréchet or Weibull distribution.
In this study, to model non-stationary conditions, we let instead the location parameter μ covary with MSL (which varies over time) assuming a simple linear dependence of the following form: 10.1029/2023EF003649 6 of 14 where b is a bias term, w is the weight, and MSL(t) is expressed relative to the year 2000 CE baseline MSL. This linear dependence can be relaxed accordingly allowing for more complex (quadratic) interactions , but does result in reduced uncertainty in the sense that less parameters are being computed. It is evident from Equations 9 and 10 that the non-stationary analysis will yield similar results with the AF only when the bias term (b) is equal to the stationary location and w = 1, assuming a common scale (σ) which does not change over time. The GEV distribution parameters in this work were obtained by means of optimization (maximization) of the likelihood function (Castillo et al., 2005). Τhe maximum likelihood parameter estimates are approximately normally distributed with the inverse hessian matrix being equal to the variance-covariance matrix; thus, confidence intervals of the parameters follow directly from the normality approximation (Coles et al., 2001). The quantiles of extremes associated with a probability of exceedance p, or else, a return period 1/p, can be then retrieved from the following expressions: Under non-stationary conditions, we blended uncertainty stemming from both GEV distribution parameters and SLR projections by carrying out a Monte Carlo experiment with 1,000 iterations. In each iteration, we drew a SLR value from the respective Gaussian distribution at random (Section 2.2). Further, we randomly sampled each parameter of our non-stationary model from a Gaussian distribution with mean being the maximum likelihood estimate and standard deviation obtained from the variance-covariance matrix. These random values were used in conjunction with Equations 10 and 11 to yield an arbitrary return level associated with different probabilities of exceedance (p). The 90% confidence intervals were then constructed by obtaining the 5th and 95th percentiles of these 1,000 random return levels.
To compare different model configurations, that is, different types of GEV distributions and subsequently stationary versus non-stationary conditions, we utilized the likelihood ratio test (LRT) (Coles et al., 2001). Let M 0 and M 1 be two nested models such that M 0 ⊂M 1 , then the deviance (D) statistic is defined as: where L(•) indicates the maximum log-likelihood value. High values of D suggest that model M 1 captures greater variability in the data than model M 0 . The statistic D is approximately chi-squared distributed (D ∼ χ 2 [k]) with the degrees of freedom (k) being equal to the difference in number of parameters of the two models (Coles et al., 2001). Hence, the null (H 0 ) hypothesis that the two models are identical cannot be accepted if the absolute value of D is greater than the theoretical value of χ 2 at a significance level α (here, 5%).

Trend Analysis of Extremes
After pre-processing the GESLA-3 database, the trends of annual maxima still water levels are computed for the selected 204 tide gauges across the world; trend detection is performed using the Mann-Kendall test (Wang et al., 2020) and the τ-quantic test (Text S1 in Supporting Information S1) with a 95% confidence level. Because the latter test results in minimal discrepancies with the former (Table S1 in Supporting Information S1), here we present trends based on the MK test only. The majority (148/204) of studied tide gauges exhibit a positive trend in the ESLs, with a substantial proportion (93/148) of these positive-trend gauges displaying a statistically significant trend (Figure 1). This finding of increasing magnitude of extreme sea events in the past decades is in line with previous studies (Menéndez & Woodworth, 2010;Woodworth & Blackman, 2004). The positive trends are primarily detected along the coastlines of the United States, the North Sea, the central and west Pacific Ocean, and Oceania. In contrast, the negative patterns are mainly observed in the Gulf of Alaska, the Arctic, and the Baltic Sea, with the number of tide gauges showing negative trends (56/204) and subsequent statistical significance (22/56) being considerably smaller than those with significant positive trend. When annual maxima are reduced to MSL, the trend analysis reveals a different story ( Figure S1 in Supporting Information S1). Although the number of gauges with the positive trends (126/204) are still considerable, the fraction of those with significant trend (20/126) is substantially smaller. This means we recognize more gauges showing negative trends (78/204), but still, only a small fraction of which being statistically significant (10/78). Interestingly, groups (rather than individual gauges) previously demonstrating positive trends mainly in the southwest of Europe, central Pacific, and Japan, are now exhibiting a negative trend, when ESLs are reduced to MSL. This implies that in these regions a declining trend in the combined activity of storm surge and tide, that is, storm-tide, has canceled out the SLR trend. Likewise, in the Baltic Sea and northern Europe where trends in the annual maxima are shifted from negative to positive after detrending ESLs, the storm-tide component has been likely increasing (Calafat et al., 2022), overwhelming the decreasing rate of MSL. All-in-all, our trend analysis indicates that MSL rise is the key driver of significant positive trends in the ESLs of most tide gauges around the world which we examined. Negative trends, on the other end, appear to coincide with declining MSLs in regions of crustal uplift induced by glacial isostatic adjustment (Doser & Rodriguez, 2011;Richter et al., 2012).

Frequency Analysis of Extremes
To evaluate the effects of SLR on the frequency of ESLs, we first modeled historical (hereinafter also current/present-day) return levels by fitting the stationary GEV distribution to annual maxima still water levels.
We fitted the GEV distribution to annual maxima from 115 significant-trend gauges except for 8 for which the year 2000 (necessary for subsequent modeling of non-stationarity) had to be discarded after data quality control.
To this end, we examined both a Weibull/Fréchet type (ξ ≠ 0) and a Gumbel type (ξ ⇾ 0) GEV distribution and assessed the differences in the two models using the LRT. We found that for the majority (∼60%) of these gauges the Gumbel distribution provided a better fit than the Weibull/Fréchet distribution for the annual maxima sea levels; a result in accordance with earlier studies . Next, we modeled non-stationary conditions by allowing the location (μ) parameter to covary with MSL, the latter expressed relative to the year 2000 CE baseline. The rationale behind examining covariability merely in the μ parameter stems from our trend analysis showing that trends are primarily due to MSL change. On the contrary, extreme value model scale (σ) parameter is associated with the storm-tide component of still water level, while shape (ξ) parameter is hard to obtain with precision and hence unreasonable to model as a function of a covariate or time Coles et al., 2001). For almost all these gauges (except for 2) with a trend in the ESLs, the non-stationary GEV model yielded a better fit than the stationary one, as suggested by the LRT. This finding implies that stationarity assumption of the GEV distribution is violated within the context of SLR, and so, projected ESLs can be better modeled by accounting for an increasing MSL. Once the non-stationary models were established, we then made use of historical 100-yr return levels derived from the stationary GEV model, and in conjunction with median SLR projections  we quantified the respective non-stationary return periods under future global warming scenarios. Here, we aim to emphasize the importance of considering non-stationarity by contrasting future return periods which incorporate the effect of SLR with current best-practice recurrence periods which are generally assumed to be stationary. Figure 2 presents such results for RCP4.5 and 8.5 (2050) at selected positive-trend tide gauges for which we first ensured that the respective GEV distribution provided an overall adequate fit by examining quantile-to-quantile plots. A direct remark is that the SLR will significantly raise the probability of occurrence of the present-day 100-yr event across the globe, with very few exceptions of gauges in the Gulf of Mexico, the North Atlantic, and north Europe (Figure 2). The vast majority (75%) of gauges are expected to experience their historical 100-yr return level every 15 years on average by the mid-21st century, under RCP4.5 ( Figure S2 in Supporting Information S1). By then, under RCP8.5, 75% of the tide gauges will experience their 100-yr event every 9 years on average ( Figure S2 in Supporting Information S1). A complete list of all 80 gauges and the non-stationary return period of their historical 100-yr return level under RCP4.5 and 8.5 (2050) scenarios is provided in Table S2 in Supporting Information S1. By the end of this century, almost all tide gauges will encounter their historical 100-yr return level on an annual basis, under both RCP4.5 and 8.5 scenarios.
We outline the evolution of ESLs over time under RCP4.5 and 8.5 scenarios, by focusing on the 100-yr event at three major port cities with large population: (a) San Diego, USA, (b) Chiba, Japan, and (c) Sydney, Australia. The tide gauge of Chiba lies on Tokyo Bay ∼40 km away from Tokyo, and thus it is also representative of the sea level information along the coast of Tokyo megapolis. Figure 3 illustrates the decadal development of the 100-yr return level from 2050 to 2100 under the two scenarios, starting from the historical (stationary) value. An immediate observation follows then for all three sites that differences in ESLs between RCP4.5 and 8.5 are subtle for the first two decades, that is, from 2050 to 2070, primarily due to small differences in projected SLR values (Figure 3). However, from 2070 onwards, the time series of the 100-yr event under RCP8.5 starts to deviate 10.1029/2023EF003649 8 of 14 upwards, being significantly different from RCP4.5 by the end of the century. Specifically, by 2100, the 100-yr return level at Chiba station will reach 4.80 and 5.61 m under RCP4.5 and 8.5 scenarios, correspondingly, much greater than the historical 3.60 m. In Sydney, the 100-yr event will be 3.23 m (RCP4.5) and 3.78 m (RCP8.5), respectively, in contrast to the historical level of 2.41 m. Also, San Diego, where the historical 100-yr event is 3.64 m, will experience a 100-yr return level of 4.62 and 5.26 m under moderate and high concentration scenarios, respectively. Differences in ESLs between the two scenarios appear to be ≥+0.55 m for all three stations. If CO 2 emissions continue to rise throughout the 21st century keeping concentration levels high (RCP8.5), the findings here suggest that SLR will pose a drastically increased flood hazard to dense urban environments threatening entire coastal communities with ending up underwater.
We also assess uncertainty in future (here, in 2050) return curves associated with uncertainty in SLR projections and non-stationary GEV distribution parameters, by highlighting results from three cities: (a) Washington, USA, (b) Kushimoto, Japan, and (c) Adelaide, Australia. To incorporate uncertainty from both sources, we initially  fitted normal distributions to sea-level projections. A Monte Carlo simulation with 1,000 iterations was then conducted in order to estimate the 90% confidence intervals (CIs) of the future return curves. Our assessment revealed considerable compound uncertainty. In Washington, for example, the 90% range of the projected 100-yr event is from 4.70 to 7.70 m under RCP4.5 scenario (Figure 4). In the coastal city of Kushimoto, the anticipated 100-yr event under RCP 4.5 ranges between 3.43 and 3.93 m (Figure 4). The same range for Adelaide is from 4.00 to 4.75 m (Figure 4). A complete list with 90% CIs of future (RCP4.5 and 8.5, 2050) 100-yr events for all 80 tide gauges is provided in Table S3 in Supporting Information S1. These, rather wide bounds, emphasize the profoundly uncertain context within which sustainable design of coastal structures should be carried out, balancing the trade-off between cost and risk. Under RCP4.5 and 8.5, the local median SLR for the city of Washington (ξ > 0) is 35 and 40 cm, respectively ( Figure S3 in Supporting Information S1). These values result in a considerable upward shift of the historical return curve (Figure 4), and correspondingly increase the median historical 100-yr level by ∼20% and 22%. However, for Kushimoto (ξ ⇾ 0), the approximately same amount of median SLR, that is, 37 and 43 cm under the two scenarios ( Figure S3 in Supporting Information S1), increases the median present-day 100-yr event only by ∼8% and 10%, respectively ( Figure 4). Accordingly, in the case of Adelaide (ξ < 0), the expected SLR of 26 and 30 cm ( Figure S3 in Supporting Information S1) positively shifts the median historical 100-yr level by ∼16% and 18%, respectively. It is evident that SLR will intensify the ESLs in all gauges across the globe, but our analysis underlines the fact that irrespective of the magnitude of SLR itself, this intensification will greatly vary by region depending on the characteristics of the non-stationary distribution to be used for capturing extremes and replacing current practice, that is, a stationary distribution. Our analysis also reveals that relying on a stationary distribution in conjunction with the AF to compute the frequency of future extremes, as has been done in past studies, might lead to discrepancies between true parameters and parameters used. For example, in Washington, we find the shape parameter of the stationary distribution to be ξ = 0.26 (or ξ = 0.45 if we first detrend the data), while the same parameter is ξ = 0.49 for the superior-fit non-stationary distribution. Likewise, in Adelaide, ξ is estimated around −0.01 when considering non-stationarity, as opposed to ξ = −0.21 (or −0.10 after subtracting annual MSL) when ignoring MSL as a covariate. Since such discrepancies in the shape parameter of the GEV distribution might arise depending on whether non-stationarity is accounted or not, and because return curves of extremes as well as the AF factor itself are sensitive to ξ (Buchanan et al., 2017), there is additional evidence on why a non-stationary distribution will more appropriately characterize extremes that evolve over time. The same applies to the scale parameter in the case of Gumbel distribution.
To further explore the distinct differences between non-stationary distributions and stationary distributions when used with the AF, we now focus on three tide gauges for which the Gumbel type provides the best fit, that is, Aburatsu, JPN, Charlottetown, CAN, and Wilmington, USA. Here, we examine the importance of using non-stationary distributions for calculating return levels under future SLR scenarios. In 2100, the median SLR amount under RCP4.5 scenario for the three tide gauges is δ = 1.18 m, δ = 1.11 m, and δ = 1.02 m, respectively, while under RCP8.5 the respective amounts are 1.82, 1.73, and 1.61 m. Table S4 in Supporting Information S1 shows (non-)stationary Gumbel distribution parameters for these three tide gauges. It is evident that for a fixed amount of SLR (δ), the location parameter of the non-stationary Gumbel (Equation 10) is not the same as the location of the stationary Gumbel allowing merely for an increase by δ (Equation 9), due to discrepancies between μ and b, but primarily because w ≠ 1. In addition, there exist also small differences between the scale (σ) parameters, with the non-stationary ones being more adequate in the sense that the LRT indicated a better fit when considering non-stationarity. Under RCP4.5 (8.5), the 100-yr return level in 2100 for Aburatsu is 4.48 (5.12) m, if we consider the stationary distribution with a change for δ. On the contrary, if we consider the non-stationary parameters, the 100-yr return level in 2100 for the same station now becomes 4.95 and 5.89 m, under RCP4.5 and 8.5 correspondingly. These differences of +47 and +77 cm in extreme sea levels respectively, are crucial when considering strategies for coastal flood hazard mitigation. Similarly, for Wilmington, these differences in coastal flood hazard stemming from the distribution used, reach up to +53 and +95 cm (observe the high values of b and w in Table S4 in Supporting Information S1). For most of the tide gauges examined, we find that future flood hazard is underestimated if non-stationary distributions are not considered. To support the validity of this assertion, we first plot the distribution of estimated w's ( Figure S4 in Supporting Information S1) and perform a one-sample Wilcoxon signed-rank test to check if the median of the distribution is significantly different than 1. Also, we conduct the two-sample variant of the same test to evaluate whether the distribution of b's is stochastically greater than that of stationary μ's. At the 5% significance level, we find that both hypotheses hold thus allowing us to infer that indeed non-stationary parameters yield increased hazard for the majority of tide gauges. However, there are of course cases (albeit few) where the use of a non-stationary distribution does not lead to increased estimates of future hazard. For example, this is the case for Charlottetown where we find that the combination of non-stationary parameters b and w (Table S4 in Supporting Information S1) leads to negative differences with respect to the stationary distribution plus an increased location by a SLR amount. The differences for Charlottetown are −33 cm and −53 cm under RCP4.5 and 8.5 in 2100, respectively. All-in-all, the findings presented here support the assertion that non-stationary GEVs result in different coastal flood hazard projections, and in a sense more appropriate, considering that they provide a significantly better fit than the stationary distributions.

Conclusions
Our trend analysis, limited by a small number of tide gauges with a long (≥55 years) observational record, but also considering the interconnectivity of sea level information among regions, reveals that extremely high sea levels have been increasing over the last decades worldwide, with positive trends primarily attributed to MSL rise. Oppositely, the negative trends are observed at a few tide gauges situated in regions where crustal uplift has been taking place, due to the long-term process of glacial unloading. Cross-validation of Mann-Kendall trends is performed by using the τ-quantic stationarity test; our results appear to be robust to different trend detection methods. It is noted that even though in some regions counteracting physical processes combine to yield a non-significant trend in extremes, that does not necessarily imply that this will be the case in future as well. For example, as shown in Section 3.1, a fair amount of positive trends are not significant, some of which are observed in Northeast US and Northern Europe. This may be attributed to the fact that vertical land motion due to glacial isostatic adjustment might balance out melt water and thermal expansion of seawater, among other reasons. Hence, in a warming climate, the latter two processes might outpace the tectonic adjustments and so relative SLR possibly may accelerate faster than what has been observed in the past, which in turn might lead to significant changes in extremes. However, these physical processes are complex, let alone their interactions, and thus a definite assertion with respect to the fate of future trends in extremes is prohibitive. The trend analysis of AMSLs outlined in this study relies on the latest version of GESLA data which were put together in late 2021. It is evident, though, that as more data are continuously being collected, future studies would benefit from prolonged records which can alleviate the modifiable temporal unit problem associated with trend detection (Cheng & Adepeju, 2014). Additionally, the trend comparison of additional sampling schemes, for example, r maxima events per year (Marcos et al., 2009) with the one employed here (annual maxima), would also increase robustness of observed trends. Finally, it is imperative to recognize that the trend-detection methods used in this study do not account for autocorrelation in the annual maxima time series. Because sea levels can exhibit longterm memory, partly due to inertia of the global ocean (Dangendorf et al., 2014), serial correlation may affect the detection of significant trends (Yue et al., 2002). Future research regarding trends in sea-level extremes could thus employ more sophisticated models for trend evaluation, such as a modified version of the τ-quantic test presented here which can account for autocorrelation in the data (Text S1 in Supporting Information S1). Having said that, our trend analysis presumes independent (thus uncorrelated) annual maxima sea-levels which is also an indispensable assumption of the subsequent extreme value analysis and maximum likelihood estimation.
By 2050, our frequency analysis suggests that climate change, and particularly SLR, will have a striking impact on the frequency of coastal extremes. Under median SLR projections, most of the tide gauges will experience a 6 to 11-fold increase in the frequency of their present-day 100-yr event for RCP4.5 and 8.5, respectively. The same 100-yr event will likely occur on a yearly basis by end of the century, even under a moderate concentration scenario. Since >600 million people reside currently in low-lying coastal regions, with this number expected to rise in the future (Neumann et al., 2015), our results highlight the need for comprehensive management of flood risk which would consider changes in the ESLs on the following decades. As discussed above, we find that the concentration pathways play a crucial role in the projected coastal flood hazard from 2070 and beyond. More importantly, in this work we show that non-stationary distributions are more appropriate for modeling future coastal flood hazard than considering the equivalent stationary distribution and shifting the location parameter by the respective amount of SLR. We highlight that these differences might be substantial depending on the tide gauge of interest. These differences stem from discrepancies between the estimated stationary and non-stationary parameters.
While our probabilistic return curves could assist local risk management, our analysis does have certain limitations. First, as highlighted in the "Sea-level annul maxima and projections" subsection, the SLR projections used in this study are only treated probabilistically after assuming a Gaussian distribution. In reality, these projections constitute an ensemble of simulations, and thus unique distributions do not exist. Future studies would benefit from a multi-distribution approach. Also, while these sea-level projections expand scientific understanding by incorporating previously ignored physical mechanisms, there still exist processes which are currently not implemented in any ice-sheet retreat model . It is apparent that as new knowledge arises, revising SLR projections, information on the anticipated ESLs should be revised accordingly. Second, the non-stationary frequency models employed here rely solely on covariability with MSL even though very few tide gauges exhibit trends in extreme storm-tide activity as well, as shown in subsection "Trend analysis of extremes." For this small number of gauges, allowing the scale (σ) parameter to covary with storm characteristics would thus be more appropriate than solely letting μ covary with MSL. It should be noted though, that finding a common meaningful covariate to model non-stationarity of the scale parameter at different tide gauges is inherently difficult; there exist numerous options, for example, maximum wind speed or minimum sea-level pressure during a storm, storm size, for example, radius of an (extra)tropical cyclone, storm duration, etc. Besides, difficulties also stem from acquisition of meaningful projections for such covariates. A major assumption of this work is that future storm activity will remain same as in the past, that is, MSL will still be the main driver of ESL magnitude in the years to come. This assumption is further supported by the fact that climate model projections exhibit low confidence as to what the frequency and intensity of future storms will be (Knutson et al., 2010), while latest literature on the matter also suggests that natural climate variability overwhelms trend signals in projected global storm surges (Shimura et al., 2022). It is also important to acknowledge that tide activity might also exhibit future changes due to SLR, albeit small and hardly predictable (Wahl & Dangendorf, 2022), thus neglected in this work. Finally, while our frequency analysis is based on the well-grounded EVT, future studies could benefit from new frameworks for extreme value analysis like the metastatistical extreme value approach (Marani & Ignaccolo, 2015), which have shown to be competitive in model intercomparison studies (Caruso & Marani, 2022). Notwithstanding these shortcomings, this study benefits from explicit non-stationary distributions that capture the true properties of a system that evolves over time, and our analysis provides robust probabilistic assessments of future flood hazard for coastal communities.