Modeling nonstationary extremes of storm severity: Comparing parametric and semiparametric inference

This article compares the modeling of nonstationary extreme events using parametric models with local parametric and semiparametric approaches also motivated by extreme value theory. Specifically, three estimators are compared based on (a) (local) semiparametric moment estimation, (b) (local) maximum likelihood estimation, and (c) spline‐based maximum likelihood estimation. Inference is performed in a sequential manner, highlighting the synergies between the different approaches to estimating extreme quantiles, including the T‐year level and right endpoint when finite. We present a novel heuristic to estimate nonstationary extreme value threshold with exceedances varying on a circular domain, and hypothesis‐testing procedures for identifying max‐domain of attraction in the nonstationary setting. Bootstrapping is used to estimate nonstationary confidence bounds throughout. We provide step‐by‐step guides for estimation, and explore the different inference strategies in application to directional modeling of hindcast storm peak significant wave heights recorded in the North Sea.


INTRODUCTION AND MOTIVATION
Statistical inference for extreme values has developed rapidly in recent years, offering considerable scope for practical application in science and engineering. Two seemingly divergent camps of statistical thought have emerged, proposing different approaches to extreme value modeling, yielding inferences which are not always obviously in agreement. The work presented in this article seeks to identify potential points of contact between these so-called parametric and semiparametric frameworks for extreme value inference, to encourage better common understanding and convergence of at least some practices, in particular for tackling nonstationary extremes. Nonstationarity is commonplace in environmental extremes; physical processes generate extreme values which typically vary systematically with covariates, including space and time. For the peaks-over-threshold (POT) method, where only data exceeding a threshold are used for analysis (Balkema & de Haan, 1974;Pickands III, 1975), various models have been proposed to capture nonstationarity, including those of Davison and Smith (1990) and Leadbetter (1991). In the parametric framework, nonstationarity can be incorporated within the appropriate distribution function for threshold exceedances by allowing the distribution's (shape and scale) parameters to vary with covariates (see, e.g., Chavez-Demoulin & Davison, 2005;Coles, 2001, and references therein). Important assumptions underpinning this approach are that the data generating process is locally stationary and that observations from the data generating process can be considered approximately independent given covariates.
Applications of nonstationary extreme value analysis are more numerous using parametric inference. For example, in an ocean engineering context, Forristall (2004) performs extreme value analysis of significant wave height for directional octants. This approach (i) accommodates directional nonstationarity (by conducting independent inferences per directional octant) and (ii) allows extreme quantiles for specific directional sectors to be estimated. Choice of number and widths of directional sectors is an open problem (see, e.g., Folgueras et al., 2019;Ross et al., 2018). Moreover, nonstationary inference provides a number of challenges in practical assessment and interpretation of uncertainty and risk (e.g., Feld et al., 2019;Jonathan & Ewans, 2007;Mackay et al., 2010).
The need to incorporate a nonstationary threshold within extreme value inference is well recognized for environmental applications. Robinson and Tawn (1997) base their approach on a nonstationary threshold to characterize the evolution of extreme sea currents. Northrop and Jonathan (2011) propose a covariate-dependent threshold estimated using quantile regression, and Northrop et al. (2016) propose a cross-validation procedure for threshold selection. We introduce a method for selection of a nonstationary threshold amenable to both parametric and semiparametric approaches to inference. The basic assumption is that on a narrow region of the directional domain there is a unique physical regime which dictates the tail heaviness of the underlying distribution although extreme events can become more (or less) frequent. In this work, we propose a procedure for nonstationary threshold selection, which incorporates a smooth weight function to describe the varying density of extreme observations with directional covariate on the circular domain. The smooth weight function avoids inconsistencies stemming from the sectoring approach identified by Forristall (2004). The threshold selection procedure is common to both parametric and semiparametric approaches used for extreme value estimation.
Estimating the effects of periodic or circular covariates including direction and season in extreme value analysis is challenging in general. Coles and Walshaw (1994) consider extreme value analysis of wind speed on a circular domain, and Mendez et al. (2008) a seasonal model for significant wave height. Since extreme value parameters might be expected to vary smoothly with covariate, various different smooth basis representations (e.g., splines) for parameters of the conditional distribution of threshold exceedances, as a function of covariate as useful (see, e.g., Jones et al., 2016;Oumow et al., 2012;Ugarte et al., 2012;Zanini et al., 2020). Covariate basis representations require careful choice of order of complexity to avoid overparametrization. Some representations are attractive since basis functions have compact support on the covariate domain, facilitating computationally efficient and stable inference . In this work we consider a parametric estimation scheme using a penalized periodic B-spline covariate representation.
An alternative approach to nonstationary extreme value analysis is to estimate parametric and semiparametric models which are local on the covariate domain. Local models avoid the complexity of global covariate representations, particularly problematic for covariate intervals where extreme events are rare. In this work we consider semiparametric moment and local parametric maximum likelihood models.

Objectives and layout
The goal in this article is to explore the synergies between different parametric and semiparametric approaches to inference on extreme value data nonstationary with respect to periodic covariate. The default parametric approach employed is maximum likelihood estimation for peaks over threshold assumed to follow a generalized Pareto distribution, with spline descriptions of model parameter variation with covariate on some covariate domain. For comparison, two other covariate-local stationary models are considered: the first uses local parametric maximum likelihood estimation, and the second local semiparametric moment estimation; to achieve the latter we extend the semiparametric methodology so that inference for quantities of interest in ocean engineering is possible and meaningful. We develop (a) a new common local heuristic procedure for extreme value threshold selection for both local parametric and semiparametric frameworks, and (b) local inference for generalized Pareto shape parameter, extreme value index, and related quantities.
We then present a comparative study showing how the spline parametric, local parametric, and semiparametric approaches provide complementary inference for directional extremes in terms of (a) extreme value threshold selection, (b) extreme value model estimation, and (c) extreme quantile estimation including the T-year value and right endpoint of the support of the underlying distribution when is finite. We illustrate the approaches in application to directional extreme value analysis of hindcast storm peak significant wave height recorded at a northern North Sea location offshore Norway. We conduct hypothesis tests to characterize variation in the max-domain of attraction of the distribution of storm peak significant wave height as a function of covariate. We hope to demonstrate that spline parametric, local parametric, and semiparametric approaches provide complementary nonstationary extreme value inference, and that local parametric and semiparametric approaches may be of practical benefit to practitioners in coastal and ocean engineering and environmental sciences.
The remainder of the article is organized as follows. The application to directional modeling of H sp s is introduced in Section 2. Section 3 provides theoretical motivation for nonstationarity analysis. Section 4 discusses nonstationary threshold estimation and extreme value model parameter estimation. Sections 5 and 6 discuss estimation of extreme value parameters and extreme quantiles. Results of the North Sea application are presented in Section 7. Finally, Section 8 provides discussion and conclusions. Appendix A provides theoretical motivation for nonstationary inference on peaks over threshold, and Appendix B provides algorithms for the estimation of extreme value threshold, spline generalized Pareto inference and for the overall analysis.

MOTIVATING APPLICATION
The data for the motivating application are described by Randell et al. (2016), consisting of observations of storm severity, in terms of storm peak significant wave height henceforth referred to as H sp s and storm direction at a location in the northern North Sea. Significant wave height H s measures the roughness of the ocean surface, and can be defined as four times the standard deviation of the ocean surface elevation at a spatial location for a specified period of observation. H sp s is then the maximum value of H s observed per time period corresponding to a storm event. Values of H sp s can be regarded as approximately independent given storm peak characteristics.
The application sample is taken from the WAM hindcast of Reistad et al. (2011), which provides time-series of significant wave height, (dominant) wave direction and season (defined as day of the year, for a standardized year consisting of 360 days) for 3-h sea-states for the period September 1957 to December 2012 at a northern North Sea location in the vicinity of the black disk in the upper panel of Figure 1. A hindcast is a physical model of the ocean environment, incorporating pressure field, wind field and wind-wave generation models in particular; the hindcast model is calibrated to observations of the environment from instrumented offshore facilities, moored buoys, and satellite altimeters in the neighborhood of the location for a period of time, typically decades. Extreme seas in the North Sea are dominated by winter storms originating in the Atlantic Ocean and propagating eastward across the northern part of the North Sea. Due to their proximity to the storms, sea states at northern North Sea locations are usually more intense than in the southern North Sea. Occasionally, the storms travel south-eastward and intrude into the southern North Sea producing large sea states. Directions of propagation of extreme seas vary considerably with location, depending on land shadows of the British Isles, Scandinavia, and the coast of mainland Europe, and fetches associated with the Atlantic Ocean, Norwegian Sea, and the North Sea itself. In the northern North Sea the main fetches are the Norwegian Sea to the North, the Atlantic Ocean to the west, and the North Sea to the south. Extreme sea states from the directions of Scandinavia to the east and the British Isles to the south-west are unlikely. The shielding by these land masses is more effective for southern North Sea locations, resulting in a similar directional distribution but reduced wave heights by comparison with northern North Sea locations.
At the location of interest, observations of storm peak significant wave height H sp s are isolated from the hindcast time-series using the procedure described in Ewans and Jonathan (2008) as follows. Briefly, contiguous intervals of significant wave height above a low peak-picking threshold are identified, each interval assumed to correspond to a storm event. The peak-picking threshold corresponds to a directional-seasonal quantile of H s with specified nonexceedance probability, estimated using quantile regression. The maximum of significant wave height during the storm interval is taken as the storm peak significant wave height H sp s . The values of directional and seasonal covariates at the time of storm peak significant wave height are referred to as storm peak values of those variables. The resulting storm peak sample consists of 2941 values. With direction from which a storm travels expressed in degrees clockwise with respect to north, the lower panel of Figure 1 shows a polar plot of observations of H sp s (in meters) versus direction . The land shadow of Norway (approximately the directional interval (45 • , 210 • )) has a considerable effect on the rate and size of occurrences with direction. In particular, there is a dramatic increase in both rate and size of occurrences with increasing direction at around 210 • , corresponding to Atlantic storm events from the south-west able to pass the Norwegian headland. We therefore expect considerable directional variability in extreme value parameter estimates for the sample. In contrast, the magnitude of the rate of change of both rate and size of occurrences with respect to season (not shown, but see Randell et al., 2016) is lower. Winter storms (approximately from October to March) are more intense and frequent. Only winter storms with storm peak events occurring in October to March including have been considered further in this work, corresponding to a sample of 1521 storm peak values.

THEORETICAL MOTIVATION FOR THE NONSTATIONARY CASE
In Section 3.1 we first summarize theoretical results and assumptions underpinning extreme value modeling for the stationary (or omni-directional, or covariate-free) case. In Section 3.2 we then present a framework for local modeling of nonstationary extremes with covariate over a circular directional domain. These form the basis of inference schemes presented in Section 4.

Stationary case
Suppose the available sample consists of realizations of n independent (or weakly dependent) and identically distributed random variables X 1 , X 2 , … , X n . Since the random variables follow the same (unknown) distribution function, for brevity use the symbol X to refer to any of the random variables when there is no need to be specific. The common distribution function is then F(x) = P(X ≤ x), for every x ∈ R. We denote by x F the right endpoint of the support of F, namely the ultimate value which bounds all possible observations from above, x F = sup{x ∶ F(x) < 1}. We note that x F may be less than or equal to infinity. The extreme value (or extreme types) theorem (Fisher & Tippett, 1928;Gnedenko, 1943;de Haan, 1970) establishes that the limit distribution of linearly normalized partial maxima M n = max(X 1 , … , X n ) (informally "block maxima"), with normalizing constants a n > 0 and b n ∈ R, must be one of only possible three extreme value distributions: Fréchet, Gumbel, or Weibull. These three types can be nested in the one-parameter generalized extreme value (GEV) distribution. Specifically, if a n and b n exist such that, for every continuity point of (nondegenerate) G, then G must be a GEV distribution with distribution function given by: for all x such that 1 + x > 0. We then say that F belongs to the max-domain of attraction of the GEV, for some ∈ R, and write F ∈ (G ). Parameter is conventionally referred to as the shape parameter in parametric literature, and the extreme value index (EVI) in semiparametric literature of extremes. A max-domain of attraction is thought of as a class enclosing all distribution functions which, in the limit, are associated with a particular extreme value index . For example, both Cauchy and Student's t-distribution with one degree of freedom are found in the same max-domain of attraction (G ), with = 1. The Fréchet max-domain of attraction, corresponding to > 0, contains all distributions exhibiting polynomial decay, such as the Pareto, again Cauchy and Student's t and the Fréchet itself. These distributions have infinite right endpoint x F . All distribution functions belonging to (G ) with < 0, referred to as the Weibull max-domain of attraction, are short-tailed with finite x F ; examples include the uniform and beta distribution. For = 0, a continuity argument gives G =0 (x) = exp (−e −x ), x ∈ R; the corresponding Gumbel max-domain of attraction ((G ) with = 0, or (G 0 )), of particular interest in many applied fields due to both simplicity of inference and the great variety of distributions possessing an exponential tail (with either x F < ∞ or x F = ∞). The normal, gamma and log-normal distributions are members of (G 0 ). Extreme value inference using block maxima has been practiced for many years, from the time of Gumbel (1958) in application to hydrology. However, in this article, inference is based on analysis of threshold exceedances or peaks over threshold (POT), motivated by characterization of the max-domains of attraction above in terms of exceedances of high threshold. Balkema and de Haan (1974) and Pickands III (1975) established that the max-domain of attraction condition (1) is equivalent to the assertion that the (conditional) distribution of X given X > u, with u near the right endpoint x F , converges to the generalized Pareto distribution (GPD) with distribution function 1 + log G . Analysis of threshold exceedances is potentially statistically more efficient than that of block maxima: in the former, all large values of threshold exceedance in the sample are admitted, including multiple occurrences of large values belonging to same block, which might be excluded in a block maximum analysis.
The parametric and semiparametric approaches to extreme value analysis of threshold exceedances considered in this article have strong conceptual similarities, and both approaches involve estimation of three quantities corresponding to threshold, generalized Pareto shape and scale in parametric inference, and threshold, extreme value index (EVI) and associated scale in semiparametric inference. There is a large literature on estimation of shape parameter in parametric inference, and the extreme value index in semiparametric inference. In this work we use maximum likelihood (ML) (Coles, 2001;Davison & Smith, 1990;Smith, 1987) and moment (M) estimators (Dekkers et al., 1989) as typical parametric and semiparametric estimators, due to their popularity. The ML estimator assumes that the limiting generalized Pareto distribution provides an accurate description of the largest values in the sample, as though the exceedances over a fixed threshold were drawn from an actual generalized Pareto distribution; this provides a highly flexible framework for inference. The moment estimator builds on the concept of max-domain of attraction, and in this sense is model-free; it is more robust: there are seldom any assurances that exceedances of high threshold actually follow the generalized Pareto form. The moment estimator can be viewed as intuitive, since its form involves summary statistics like averages. When goodness of fit can assessed with reasonable accuracy, the ML estimator can be shown to be efficient for the shape parameter. There is concern however that the maximum likelihood may be influenced by deviations of the true underlying distribution function from the assumed generalized Pareto, resulting in extrapolation bias. For this reason, it is sometimes preferable to consider a robust estimator building on moments to lessen the importance of such deviations, at the expense of increased variance.
Viewed parametrically or semiparametrically, choosing an appropriate extreme value threshold is fundamental to inference for peaks over threshold, and involves a typical bias-variance trade-off. If the threshold is set too high, the set of threshold exceedances is small resulting in increased variance of parameter estimates. If the threshold is set too low, the variance of the parameter estimate is reduced at the cost of increased bias. Sensitivity to the extreme value threshold choice is a common critical feature of parametric and semiparametric approaches. Confirming the relative stability of estimated shape parameter (or extreme value index) and high quantiles (such as the T-year level) with near-zero exceedance probability with respect to threshold is a key diagnostic test for peaks over threshold analysis, addressed further in Section 6.
Given threshold, inference for peaks over threshold in the parametric setting involves estimation of generalized Pareto shape and scale parameters, the latter being threshold-dependent; for estimation of T-year levels, we also need to estimate the rate of threshold exceedance. In the semiparametric setting, we focus on estimation of extreme value index; scale and location normalizing functions (akin to a > 0 and b ∈ R in (A5)) required for extreme quantile calculations, are estimated separately.

Nonstationary case
In the nonstationary setting, we assume that the covariate domain  is a circle, partitioned into m intervals, with centroids j , and write Θ for the set { j } m j=1 of centroids; for directional analysis considered here, we set  = [0 • , 360 • ), and We also assume that X( j ) and X( j ′ ) are independently distributed when j ≠ j ′ . We then assume the extreme value theorem holds on covariate neighborhood  ( j , h) of half-width h > 0 centered at j (see Section 4.1). This means that condition (1) holds with respect to F j and the shape parameter (or extreme value index) ( j ) governs the tail behavior of the underlying F j . We also assume throughout that changes in the tail dynamics of the data generating process are smooth, and that ( ) is a smooth (unknown) function of ∈ . These assumptions justify the nonstationary parametric and semiparametric inference procedures used in this work. The theoretical motivation for the nonstationary case is given in Appendix A, and summarized here. Suppose that for each ∈ , neighborhood  ( , h) contains N( ) random occurrences of X. We start by selecting k( ) threshold exceedances on that neighborhood, assuming a limiting generalized Pareto distribution holds for sufficiently large N( ). Given the initial assumption of (weak) dependence on , tail observations from F can be viewed as peaks over threshold at each as follows, for any ∈  ( , h) ⊂ . With the notation N = n × m, let X N−k( )∶N denote the (N − k( ))th neighborhood-specific threshold having selected fixed number k( ) for that neighborhood. To ensure integrity of statistical analysis on , the number k( ) is defined using a weighting function indicating "integer part of," and k is the total number of largest observations retained for inference across the whole of . For a finite number m of directional sectors j , the smooth function ( ) satisfies ∑ m j=1 ( j )∕m = 1. Given threshold u( ) = X N−k( )∶N , we perform nonstationary extreme value inference for peaks over threshold. In the parametric setting, inference involves estimation of generalized Pareto shape and scale parameters which vary on the covariate domain; for estimation of T-year levels, we also need to estimate the rate of threshold exceedance which also varies with covariate. The semiparametric approach is concerned with extrapolation beyond the sample through estimation of extreme quantiles with very small tail probabilities; semiparametric estimators are typically smooth functionals (averages, quantiles, etc.) of sample upper order statistics, estimating extreme value index and scale function, in terms of the extreme sample fraction k( )∕N.

NONSTATIONARY THRESHOLD SELECTION
Here we describe an approach to nonstationary threshold selection useful for all of parametric and semiparametric settings. The approach exploits a heuristic criterion to select a number k = k( ) of upper order statistics, and hence a random threshold on the covariate domain. Algorithm 1 of Appendix B provides a simple implementation of the threshold selection procedure. In Section 5 we then discuss how the estimated threshold is used in extreme value estimation using each of the parametric, local parametric, and semiparametric approaches. Note that although threshold estimation is made here using either the semiparametric moment estimator (Section 4.2) or ML estimation (Section 4.3), any other consistent estimator would be equally applicable.

Threshold selection
Let  ( j , h) be the covariate neighborhood with center at j ∈ Θ and fixed radius h > 0 defined by for every j for an appropriate choice of distance d: in the directional application, we adopt wrapped-Euclidean distance is present, so that the extreme value theorem holds on  ( j , h), for every j .
We now consider a heuristic threshold selection procedure exploiting the relative density of extreme values in  ( j , h). We seek the optimal number k * j of the k largest observations in  ( j , h) as threshold exceedances. Then k * j is the solution where 0 ≤ < 0.5, and̂k( j ) is a locally consistent estimator of ( j ) using the k largest observations in  ( j , h). The heuristic procedure (4) was introduced in the first edition of Reiss and Thomas (the most recent edition being Reiss & Thomas, 2007) and then studied in detail in Neves and Fraga Alves (2004). It facilitates automatic threshold choice, understood intuitively as follows. For small k, the weighted deviations in (4) tend to be large due to the inherently large variance of̂k( j ). As k increases, the summands in (4) are expected to decrease until bias sets in and overrides the variance from which point S (k) is expected to increase again. Minimizing the weighted empirical distance (4) is equivalent to optimizing the bias-variance trade-off by exploiting the settled behavior of estimates {̂k( j ) ∶ k < N( j )} for appropriate k.
We specify a covariate weight ( j ′ ; j ) of observations for every j ′ ∈  ( j , h) ∩ Θ relative to centroid j ∈ Θ using a von Mises kernel (cf., Pewsey et al., 2014), such that Concentration parameter > 0 controls kernel spread: as increases, the local density becomes more concentrated about j ; thus, plays a role similar to bandwidth h in (3), regulating degree of directional smoothness. The values of and h need to specified carefully during applications to reflect the characteristics of the data generating process, and the resulting choice of k * j reflects the extent of stationarity on  ( j , h). The corresponding optimal estimate for nonstationary threshold is denoted u( j ) for each j ∈ Θ.
In this section we indicate the dependence of ( j ′ ; j ) on centroid j explicitly for clarity. In the discussion in Appendix A, we write ( j ′ ) for notational convenience, with the dependence on j understood. We do not claim that this heuristic approach to extreme value threshold specification is the most effective (e.g., with respect to every estimator one might devise for extreme value index), but we have found it useful in the application to estimation of in Section 7.

Semiparametric estimation
Now we outline semiparametric estimation of k ( j ) required for threshold selection above. For each j ∈ Θ, we propose an adaption of the extended version of the moment estimator (Dekkers et al., 1989) with where X N( j ) −k,N( j ) denotes the (k + 1)th largest value in the observed sample in  ( j , h). Note only the largest k observations in  ( j , h) contribute in (7). Further discussion of the moment estimator is given in Section 5.1.

Maximum likelihood estimation
The local ML estimator̂k( j ) for j ∈ Θ for threshold selection above is obtained by maximizing the directionally weighted log-likelihood with respect to the parameter pair ( ( j ), u ( j ) ) ∈ (−1, ∞) × R + , with weights ( j ′ ; j ) as in Section 4.2, and

NONSTATIONARY EXTREME VALUE ESTIMATION
This section considers estimation of extreme value parameters and extreme quantiles. We first present adapted versions of widely used semiparametric estimators for extreme value index and associated scale function based on a known moving threshold u( ), ∈ . Then we discuss two classes of parametric maximum likelihood estimators for generalized Pareto shape and scale. The first is a minor modification of the local ML estimator introduced in Section 4.3. The second is motivated by a spline representation of the variation of generalized Pareto shape and scale with respect to covariate. Section 6 then addresses estimation of extreme quantiles using the estimated parameters.
In this and subsequent sections, we consider the nonstationary extreme value threshold u( ) to be determined and fixed; our focus is therefore on characterizing threshold exceedances. Regardless of the approach to extreme value estimation, the heuristic procedure of Section 4 is used to estimate u( ).
Descriptions of the semiparametric, local parametric, and spline parametric estimation schemes are given in Sections 5.1-5.3.

Local semiparametric moment estimation
Given threshold u( j ) for j ∈ Θ, we adopt the moment estimator for the extreme value index given bŷ where moments M (l) ( j ), l = 1, 2 are calculated using where k( j ) is the number of threshold exceedances in  ( j , h). Equation (11) corresponds to an unweighted version of the moment estimator used for threshold estimation in Equation (7). This moment estimator is motivated by the asymptotic behavior of the k( j )th upper order statistics associated with the sample of independent random variables X i ( j ′ ) − X N( j ) −k( j ),N( j ) on  ( j , h) as established in the theory of extremes for threshold excesses. Conditionally on X N( j ) −k( j ),N( j ) = u( j ), their underlying distribution function tends to , for all t > 0 (cf., de Haan & Ferreira, 2006, p. 90). Smith (1987) defined the analogous ML estimator in terms of excesses over a high threshold u. We refer the reader to Appendix A for a precise definition of F [u] , and further details on how it approaches the GPD distribution function.
An associated scale estimator (e.g., de Haan & Ferreira, 2006, section 4.2) necessary for semiparametric estimation of extreme quantiles in Section 6 is then given by The moment estimator for the scale function a * j ( N∕k( j ) ) is discussed in Appendix A(ii).

Local maximum likelihood estimation
Given threshold u( j ) for j ∈ Θ, local ML estimation described of generalized Pareto shape ( j ) and scale u ( j ) is performed by the procedure described in Section 4.3, with all directional weights set to unity. For completeness, the log-likelihood to be maximized is given by This formulation incorporates a semiparametric approach to threshold selection within a local parametric approach to parameter estimation: conditioned on the random number K = k( j ) of exceedances, random exceedances of threshold u( j ) can again be viewed as independent identically distributed random variables with distribution function converging asymptotically to the generalized Pareto (Appendix A).

Spline-based maximum likelihood estimation
Given nonstationary threshold u( j ) for j ∈ Θ, we can also estimate a nonstationary generalized Pareto model using the full sample of threshold exceedances, by allowing generalized Pareto shape and scale u to be functions of on . This is the type of analysis typically undertaken in environmental and ocean engineering settings (see, e.g., Northrop et al., 2016). Specifically, we assume a periodic cubic B-spline representation for the variation of generalized Pareto shape and scale parameters with covariate (see, e.g., chapter 5 of Wood, 2017;Zanini et al., 2020). We then estimate spline coefficients using maximum penalized likelihood estimation, regulating the roughness of shape and scale with covariate to optimize predictive performance assessed using cross-validation. On the index set Θ of covariate values j , j = 1, 2, … , m, we relate the values ( j ), u ( j ) of generalized Pareto shape and scale to the periodic B-spline basis via basis matrix B with elements B jb such that where n b is the number of basis functions, and the s are basis coefficients. The sample log-likelihood  is with ( ( j ), u ( j )| X i ( j ) − u( j )) defined in (9). n( j ) denotes the number of observations in the sample at covariate j , (a) = ( (a) 1 , (a) 2 , … , (a) n b ) ⊤ for a = 1, 2, and = ( (1)⊤ , (2)⊤ ) ⊤ . We set the number of spline knots on  to be more than sufficient to capture the anticipated parameter variability with covariate, and then penalize parameter roughness globally to obtain a model with good predictive performance. Penalization is performed using first-order difference penalties for the coefficients in (14), with difference matrix D given by: The penalized log-likelihood is then: where and are smoothing parameters chosen to maximize cross-validated predictive likelihood. In the application illustrated in Section 7, cross-validation is applied as follows. Each iteration of the cross-validation consists in using a bootstrap resample (of the original sample of threshold exceedances) as training set, and observations omitted from the bootstrap resample as test set. Note that the test sets corresponding to different bootstrap resamples may therefore overlap. The training set is used to estimate the model parameters for a grid of combinations of and , and the test set to assess prediction performance using predictive log likelihood. The choices of and provide best predicted performance are adopted for subsequent inferences. The procedure is outlined in Algorithm 2 in Appendix B.

ESTIMATION OF NONSTATIONARY EXTREME QUANTILE
In this section we use extreme value estimates from Section 5 to calculate extreme quantiles, including the right endpoint of the distribution of X( ) when this is finite, in the semiparametric setting (Section 6.1) and parametric setting (Section 6.2). The stepwise inference procedure is given in Algorithm 3 of Appendix B. A common motivation applies for estimation of extreme quantiles in both parametric and semiparametric settings. Suppose that F is the actual distribution function of the data generating process at direction ∈  and Q the corresponding quantile function; that is, Q = F ← , with the left arrow indicating a generalized inverse. Conditions (A2) and (A3) of Appendix A ensure that an extreme quantile F ← (1 − p), with 1 − p > F (u( )) for some high threshold u( ) and hence very small probability p, depends only on the tail of the distribution function F . Consequently, we can estimate an extreme quantile using linear functionalb + â Q H (1 − p), with Q H relating to a generalized Pareto distribution function H . This result applies for both parametric and semiparametric approaches, with small differences. The most obvious difference is that in the parametric approach, normalizing constants a and b are provided by the scale and location of the limiting generalized Pareto distribution, whereas in the semiparametric approach these are estimated as functionals of the sample analogue of distribution function F .

Extreme quantile
In the semiparametric setting, we assume that the number k( ) of exceedances of threshold u( ) is such that k( ) → ∞ and k( )∕N( ) → 0 as N( ) → ∞. That is, the number of threshold exceedances k( ) remains negligible compared with the total number N( ) of observations in  ( , h). This condition can be written in terms of the covariate-specific sample size n, requiring that n → ∞ implies N( ) → ∞. The proposed moment estimator for the extreme quantile with small probability p → 0, N( )p∕k( ) → 0 as n → ∞, conditioned on threshold u( ), iŝ with estimated extreme value index̂M k ( ) and associated scale parameterâ M given in Section 5.1. The associate scale in (17) is a function of the top sample fraction k( )∕N( ).

Finite right endpoint
Semiparametric estimators for the right endpoint have received considerable attention in the literature (e.g., chapter 4 of de Haan & Ferreira, 2006, and connected references). We propose a model-free, data-driven estimator of the covariate-dependent right endpoint x F < ∞ for ∈ , motivated by the general endpoint estimator of Fraga Alves and Neves (2014) with nonstationary threshold u( ). We proceed on the basis of a suitable extreme value condition which partitions the Gumbel max-domain of attraction (with extreme value index = 0) into two classes of distributions, one with finite right endpoint, and the other right unbounded. An example of a distribution function in the former class is the negative Fréchet, with distribution function F , (x) = 1 − exp{−( − x) − }, x ≤ , ∈ R, > 0; simple calculations show that F , belongs to (G 0 ) despite having finite right endpoint x F = < ∞. We seek a right endpoint estimator for distributions in this (G 0 ) subclass.
representing the random variables X i in a particular neighborhood  ( , h), defined in (3), we denote by Y 1,N( ) ≤ … ≤ Y N( )−k( ),N( ) ≤ … ≤ Y N( ),N( ) the corresponding ascending order statistics, and define the general endpoint estimator of x F , assumed finite, for every ∈ , This estimator is valid for any ( ) ≤ 0, and does not require prior estimation of extreme value index (cf., Fraga Alves et al., 2017), in contrast to competitor estimators for the right endpoint proposed in the literature.

Extreme quantile
In the parametric setting, level x p ( ) with small exceedance probability p corresponds to the quantile of the distribution of threshold exceedances for direction . This suggests defining the 1/p extreme quantile as the value x p ( ) = Q H (1 − p), with normalizing constants set to the scale and location parameters of a generalized Pareto function H . Hence, for local parametric and spline parametric approaches, estimates of x p ( ) of the form are obtained, where (u, ) = k( )∕N( ) is the sample fraction of exceedances of u( ) within  ( , h). Estimation of̂( ) and̂u( ) (for ∈ Θ) using local parametric and spline parametric approaches is discussed in Sections 5.2 and 5.3. The extreme-level estimatex ML p ( ) can be interpreted as a T-year level used in ocean engineering and hydrology: that is, the value exceeded with probability 1/T per annum. Expressions (19) and (17) show obvious similarities, and also distinctive traits of the semiparametric moment, local parametric and spline parametric approaches.
For local ML estimation (Section 5.2), the values of k( j ) and N( j ) are estimated directly in Section 4. In the spline maximum likelihood approach (Section 5.3), a smoothed version of (u, j ) is estimated as the probability of threshold exceedance for j ∈ Θ using logistic regression, with log-likelihood where j = k( j )∕N( j ) is the sample proportion of threshold exceedances of u( j ) at j ∈ Θ. j = (1 + exp[− j ]) is the probability of threshold exceedance at j , with = { j } m j=1 = B for B-spline basis matrix B and parameter vector to be estimated. Roughness penalization of , with optimal roughness coefficient estimated by cross-validation, ensures good predictive performance. The penalized log-likelihood thus takes the form  pen ( ) = ( ) − P, with P = ⊤ D ⊤ D (cf., Section 5.3).

Finite right endpoint
In case ( ) < 0, the limiting generalized Pareto distribution function has finite right endpoint. A consistent estimator for this right endpoint follows from (19) by setting p = 0 F I G U R E 2 Violin plots of H sp s observations to the total of N = 1521 data points, with n here referring to the number of observations in each of the five sectors considered

APPLICATION TO STORM PEAK SIGNIFICANT WAVE HEIGHT
The discussion above provides a suite of complementary approaches to nonstationary extreme value analysis incorporating elements of both parametric and semiparametric inference, featuring a common nonstationary extreme value threshold capturing covariate dependence of threshold exceedances. In this section, we apply the methodology to estimation of T-year values from the sample of storm peak significant wave height H sp S on storm direction introduced in Section 2. The mechanics of inference for extreme quantiles, and right endpoint if appropriate, is given in Algorithm 3 of Appendix B. For brevity henceforth, we refer to semiparametric, local parametric, and spline parametric estimators as "M", "local ML," and "spline ML,', with "M" indicating a moment estimator, and "ML" referring to maximum likelihood.
Exploratory analysis of the sample, supported by previous analysis by Randell et al. (2016), suggests that the covariate domain can be partitioned into five directional sectors assumed approximately homogeneous in terms of the characteristics of H sp S . Referring to Figure 1, directional sectors corresponding to the following intervals of were identified. Sector 1 corresponds to ∈ (0 • , 50 • ] ∪ (320 • , 360 • ], for storms propagating from the Norwegian Sea to the North; Sector 2 for ∈ (50 • , 140 • ] corresponds to the "land shadow" of Norway, with fetch-limited storms propagating from the coast with a more northerly direction relative to the normal to the coast; Sector 3 is ∈ (140 • , 210 • ], again for the Norwegian land-shadow, but with storms propagating from a more southerly direction; Sector 4 is ∈ (210 • , 270 • ] corresponding to storms from the Atlantic potentially "funneled" by the Norwegian coast; and Sector 5 with ∈ (270 • , 320 • ], for more northerly Atlantic storms. Further information about the underlying physics is given in Section 2. The partitioned sample is summarized in Figure 2, using so-called "violin" plots which add kernel density estimates to a box-whisker representation. The long-tailed behavior of storms from the Atlantic is clear in Sectors 4 and 5, compared with the fetch-limited characteristics in storms from Sectors 2 and 3. Although Sector 4 exhibits the largest values of threshold exceedances in Figure 2, there is evidence from the kernel density plots that Sector 5 has a relatively long tail. In this section we seek to quantify tail-heaviness by estimating ( ) using both parametric and semiparametric approaches, and hence estimate extreme quantiles.
Using the approach in Section 4, nonstationary thresholds u( ) were estimated using both the local ML and M estimator for lag h = 60 • , concentration = 10 and parameter = 0.35; a sensitivity study was conducted to ensure that threshold inference was not overly sensitive to these choices. Estimates are shown together in Figure 3. The general trends shown by the two estimates are in agreement across the covariate domain; for example, larger values of u( ) are observed for storms from the Atlantic (SW to W), and smaller values for fetch-limited directions (E to SE). Subsequent parametric and semiparametric inference for exceedance characteristics therefore has a relatively common starting point.
Estimateŝ( j ) for j ∈ Θ from each of the M, local ML an spline ML approaches described in Section 5 are displayed in Figure 4, in terms of bootstrap means and 95% confidence intervals. Note that the M estimate of was obtained using F I G U R E 3 Polar plot of extreme value thresholds u( ) from the M (orange) and ML (blue) estimators. Storm direction is the direction from which waves propagate; radial scale of H sp s is in meters F I G U R E 4 Bootstrap mean and corresponding 95% confidence intervals for̂( j ), j ∈ Θ based on: M (left), local ML (middle) and spline ML (right) estimators the corresponding M estimate of u; analogously the local ML and spline ML estimates of adopt the local ML estimate of u. There is reasonable overall agreement between the three estimates for , and the estimates are also qualitatively plausible given other analyses of these data  and general physical considerations. Effects of land shadows (e.g., ∈ (80,150)) resulting in lower values for are clear. The spline ML estimator is smoothest with respect to and its confidence band does not include zero for any direction. In contrast, ( ) = 0 is suggested by semiparametric and local parametric estimators, most obviously for the M estimator, implying that the data generating distribution may lie in the Gumbel max-domain of attraction.

Identifying domains of attraction
Given the evidence in Figure 4, it is interesting to consider whether we can identify intervals with ( ) < 0, ( ) = 0 (and ( ) > 0 in principle) for ∈  (cf., choice of domain of attraction, see Fraga Alves et al., 2017). We attempt this using two sets of hypothesis tests. In the first test, we adopt null hypothesis F ∈ (G 0 ) versus the two-sided alternative F ∈ (G ) ≠0 . Four different test statistics are considered, three of which are semiparametric ) l 1 {X i ( j )>u( j )} for l = 1, 2 in the notation of Section 5; these test statistics are discussed in Fraga Alves et al. (2017) and references therein. The ratio test (R) assesses the relative contribution of the maximum to the average of the excesses over the given threshold u. For the R test, the null hypothesis F ∈ (G 0 ) is rejected at level ∈ (0, 1) if R k,n ( ) < g ∕2 or R k,n ( ) > g 1− ∕2 , where g is the -quantile of the Gumbel distribution, g = − log(− log ). The Greenwood-type test (G) and the Hasofer and Wang test (HW) are moment-based, and associated critical regions follow from the central limit theorem. The critical region for the two-sided test of nominal size for T * ∈ {G, W} is given by |T * k,n ( )| > z 1− ∕2 , with z denoting the -quantile of the standard normal distribution.
When ( ) > −1∕2, a parametric test statistic based on the generalized Pareto distribution is available (cf., Section 3.1 and Appendix A). Asymptotic normality of ML estimators for peaks over threshold gives rise to a likelihood ratio (LR) test using deviance statistic to test the hypothesis, where  is the log-likelihood in (13). At level , the null hypothesis is rejected when the observed value for D exceeds the (1 − )-quantile of the 2 1 distribution (see, e.g., Coles, 2001). Results using the four test statistics are illustrated in Figure 5. The values for test statistics G and HW lie generally on or beyond their critical values as a function of , providing evidence to reject the null hypothesis that ( ) = 0 at the significance level = 5%. We note that upward crossings of the indicated = 5% critical barriers in the HW test sample path evidence an underlying short tailed distribution belonging the Weibull max-domain, whereas for the G-test, the Weibull max-domain is reflected in the downward crossings of corresponding = 5% critical barriers. The value of the R test statistic lies generally on or near its lower critical value, again providing some support for the Weibull max-domain of attraction; we note that this test statistic is recognized as being somewhat more conservative than G and HW, particularly for near-rejection in favor of the Weibull domain. The semiparametric approach provides a further possibility (Fraga Alves & Neves, 2014;Neves & Pereira, 2010) for hypothesis testing, specifically to test for a Gumbel max-domain of attraction with infinite right endpoint as null hypothesis, versus an alternative hypothesis of any max-domain of attraction with ( ) ≤ 0 and x F < ∞. No similar parametric test exists, since in the parametric setting a generalized Pareto distribution with ( ) = 0 corresponds to the exponential distribution which is right unbounded. In this sense, the semiparametric approach provides a better framework for tackling the statistical assessment of a finite upper bound. For every ∈ , we test using the test statistic T given by with M estimator â provided in (12). At the significance level ∈ (0, 1), the null hypothesis is rejected whenever √ k log k |T k,n ( )| > z 1− ∕2 , where z = Φ ← ( ) denotes the -quantile of the standard normal distribution function. Results using the test statistic T are shown in Figure 6. The null hypothesis of an infinite upper bound is rejected at the = 5% level for the majority of the directional domain, except notably for storms emanating from approximately (310 • , 360 • ), around 50 • and around 160 • . From a physical perspective, these observations can be explained by occurrences of storms from the Norwegian Sea ((310 • , 360 • )), funneling from the north-east along the Norwegian coast (50 • ), and rare occurrences of dominant wave directions from the Norwegian coast (160 • ). We summarize the results of hypothesis tests concerning choice of max-domain of attraction in Figure 7, and note directional intervals of ( ) = 0, F < ∞ corresponding to storms from the Atlantic funneled along the Norwegian coast from south-west to west, and along the Norwegian coast from north to north-east.

Estimating extreme quantiles and right endpoints
We use expressions (17) and (19) to estimate quantiles x p ( ) using M, local ML and spline ML estimators. Assuming N 0 storm occurrences in observation period T 0 , the T-year directional level corresponds to the value x p ( ) such that P{X( ) > x p ( )} = (T 0 ∕N 0 ) × T −1 . Figure 8 gives a matrix rose-plots of estimated directional 100-year and 10,000-year levels with accompanying 95% bootstrap confidence bands.
There is general qualitative agreement between the three estimates for 100-year level (top row) and 10,000-year level (bottom row), in terms of bootstrap mean. Uncertainties from the M estimation are somewhat larger, as might be expected recalling the evidence of estimation in Figure 4. Not surprisingly, estimated extreme levels for directions with short fetches ( ∈ [60,140) • ) are low, whereas those corresponding to long fetches from the Atlantic Ocean and Norwegian Sea ( ∈ [310,360) ∪ [0, 60) • ) are high. These findings are broadly in line with those from hypothesis testing in Figure 7.  Figure 7 indicates there is significant evidence in the H sp s sample for concluding that the underlying distribution is right bounded on much of the directional domain. This suggests that estimation of the right endpoint x F might be usefully attempted. Three estimators are considered, as described in Section 6. The local ML and spline ML estimators for x F are only appropriate when ( ) < 0, whereas the semiparametric general estimator does not require estimation of ( ) at all, but imposes the constraints ( ) ≤ 0, x F < ∞ by construction. To ensure fair comparison of the estimates for x F , we therefore repeated the local ML and spline ML generalized Pareto fits of Section 5, with the domain of restricted F I G U R E 9 Rose diagrams for the finite right endpoint of H sp s . Three estimators are used: the general estimator (left), the local ML (middle) and spline ML (right) estimators. Bootstrap means and corresponding 95% confidence bands are displayed as a function of direction measured clockwise from north to ( ) < 0. The left panel of Figure 9 shows estimates using general estimator (18). The center and right panels show estimates from local ML and spline ML estimation, with ( ) constrained to be negative everywhere. Clearly, maximum likelihood estimation becomes more challenging as the true shape parameter approaches zero from below, with numerical optimization routines more than often experiencing convergence issues (e.g., Gomes & Neves, 2008, for a comparison between M and ML within the univariate semiparametric setting). Estimates from the general estimator (left panel) are surprisingly low compared with those for local ML and spline ML, and with semiparametric estimates of the 100-and 10,000-year values using the moment estimator, shown in Figure 8. Agreement between the three estimators for the finite right endpoint cannot be expected to be nearly as good as that observed for T-year level, since the estimators are rather different in nature.
Local ML and spline ML estimates (Figure 9, center and right panels) show some agreement particularly for westerly, and north to north-easterly directions, where Figure 7 suggests ( ) = 0 and x F < ∞; the local ML estimator provides considerably more uncertain estimates, because of input estimates for ( ) near zero. The general endpoint estimator exhibits small variance in comparison, a distinctive trait that might be attributed to the simplicity in form not requiring external estimation of ; the small variance of this semiparametric estimator has been proved both theoretically and via extensive simulation studies (Fraga Alves & Neves, 2014). Increased uncertainty in the general endpoint estimator (Figure 9) corresponds to regions in Figure 7, for which ( ) = 0 and x F = ∞. A study of the sensitivity of the general endpoint estimator to specification of h and k( ) was undertaken, particularly due to the low estimates it yields. The study revealed that the general endpoint estimator is in fact remarkably stable with respect to both h and k( ). We must of course keep in mind that all of the estimates in Figure 9 assume a finite right endpoint, whereas the evidence from Figure 7 is that this is not the case for all directions. Note further that the asymptotic distribution for the general endpoint estimator is nonnormal, and therefore the estimated bootstrap 95% confidence bounds offer a limited view of its asymptotic variance. A similar remark applies to the local-ML estimator for the right endpoint of the generalized Pareto distribution for the irregular case for ∈ (−1, −1∕2].
In summary, application of the parametric and semiparametric methodologies developed in Sections 4-6 above to the sample of directional storm peak significant wave height suggests that estimates for , and 100-year and 10,000-year levels are in good qualitative agreement. Where differences occur, they can be explained in terms of specific modeling assumptions made, rather than in terms of fundamental differences in the underlying approaches to extreme value analysis.

DISCUSSION AND CONCLUSIONS
This article presents a framework for inference on nonstationary peaks over threshold, reconciling approaches from semiparametric and parametric extreme value analysis, in application to directional ocean storm severity. The key components of the framework are (a) estimation of nonstationary extreme value threshold, and (b) estimation of tail characteristics from threshold exceedances, including extreme quantiles and finite right endpoint when appropriate. Threshold estimation is performed using a nonstationary extension of a heuristic approach for semiparametric moment (M) and parametric maximum likelihood (ML) estimators. Tail characteristics and extreme quantiles are then estimated based on semiparametric M, local parametric ML and spline ML estimators. Given threshold, the two streams of extreme value inference mirror each other. In the spline ML approach, a penalized B-splines representation with compact support is used: each basis function is nonzero on a specific interval of the covariate domain. This feature plays a similar role to bandwidth of directional neighborhoods in the semiparametric M and parametric local ML approaches.
We also consider estimators for the nonstationary finite right endpoint, including extension of the semiparametric general estimator of Fraga Alves and Neves (2014). Hypothesis tests, used to characterize the max-domain of attraction of storm peak significant wave height as a function of direction, suggest that the distributions of storm peak significant wave height for storms from the Atlantic and Norwegian Sea are unbounded above.
Inferences regarding directional thresholds for storm peak significant wave height are in good agreement over the covariate domain. Estimates for 100 and 10,000-year levels are also in reasonable agreement. Estimates for the right endpoint are more different across approaches, and are influenced by the specifics of modeling assumptions made associated with the different estimation strategies. For the application considered, both parametric and semiparametric inference provides similar characterizations of extreme nonstationary ocean environments. Indeed, we illustrate how ideas from the semiparametric and parametric schools of thought can be used in tandem to exploit the desirable features of the approaches, while overcoming some obvious pitfalls. For example, threshold estimation (used for both semiparametric and parametric analysis) is motivated by an inherently nonparametric heuristic in Section 4.
Parametric approaches to nonstationary extremes are relatively well studied due in part to the wide range of flexible covariate representations for generalized Pareto parameters for threshold exceedances, and associated methods for regression and assessment of model fit. In contrast, from a semiparametric perspective, an exact generalized Pareto correspondence need not be assumed, offering greater flexibility. However, avoiding a particular model choice from the outset generally results in increased uncertainty of estimates for and high quantiles. Nevertheless, we show in this work that semiparametric and parametric approaches perform rather similarly when set up reasonably. We hope that the fusion of ideas from parametric and semiparametric approaches to extreme value analysis outlined in this article provide a basis for increased understanding and quantification of extreme phenomena in environmental and engineering applications.

ACKNOWLEDGMENTS
Cláudia Neves gratefully acknowledges support from EPSRC-UKRI Innovation Fellowship grant EP/S001263/1 and project FCT-UIDB/00006/2020. The authors thank two reviewers for their comments on an earlier version of the manuscript.
Data are available from the authors upon request.

R E F E R E N C E S APPENDIX A. BASIS FOR INFERENCE ON DIRECTIONAL EXTREMES VIA THE POT METHOD
The contents of this appendix build on appendix B of de Haan et al. (2015). This article offers added flexibility to the latter by allowing = (s) to vary with s ≥ 0, where s might represent direction , or time or some other covariate. As a consequence, the right endpoint does not need to be assumed constant in s. We will not delve into the theoretical details in terms of explicit smoothness and boundedness conditions, needing to be in place particularly by assuming h = h n > 0. These are clearly beyond the scope of this article, but we envisage that the probabilistic underpinning to this work will stem from chapter 9 of de Haan and Ferreira (2006). For each direction s ≥ 0, let (X 1 (s), X 2 (s), … , X n (s)) be a vector of i.i.d. random variables with common distribution function F s (x), for all x ∈ R, absolutely continuous with right endpoint x F s ≤ ∞. Assume F s ∈ (G (s) ), for some (s) ∈ R and for every s ≥ 0, that is, that condition (1) holds locally for each s. In this setting, theorem 1.1.6 of de Haan and Ferreira (2006) ascertains that it is possible to replace n with t running over the real line in such a way that (1) becomes equivalent to the following extreme value condition: there exists a positive function a * s such that for all x with 1 + (s)x > 0. The limit in (A1) is the tail distribution function (also known as survival function) of the generalized Pareto (GP) distribution with shape parameter (s) ∈ R, given by (1 + (s)x) −1∕ (s) . The extreme value condition (A1) is often key for describing rare events' behavior in lieu of the dual max-domain of attraction characterization F ∈ (G ).

(i) Parametric approach
Taking a parametric view, the peaks over threshold (POT)-domain of attraction condition (A1) prescribes the GP distribution as the proper fit to the normalized exceedances given these are above a certain high threshold near the right endpoint x F s . With minimal notational changes around a fixed (deterministic) threshold u, it is straightforward to see that condition (A1) implies locally uniformly in x > u, for each s ≥ 0, with (u) > 0 (we omit the subscript s on and below) for simplicity of notation) and H , , (x) ∶= 1 − (1 + (x − )∕ ) −1∕ , for all x such that 0 < H , , (x) < 1, with location ∈ R and scale > 0. Informally, F [u] s (x) ∶= P{X 1 (s) ≤ x + u | X 1 (s) > u} ≈ H (s),u, (u) (x), for all x > 0 and u near the right endpoint x F s , with the scale parameter implicitly defined in terms of s through the threshold u(s).
For each s, the limiting relation (A2) provides the probabilistic underpinning for fitting a GP distribution function to the unconditional tail distribution function F s (x) ∶= 1 − F s (x) with x sufficiently large. This becomes more evident since F s (x) = (1 − F s (u))F [u] s (x) + F s (u), from which for x > u, as u → x F s . Finally, we note that ( 1 − H (s),u, (u) (x) ) (1 − F s (u)) = H (s), * , * (u) (x), where * − u = (u) U H (1 − F s (u)), * (u) = (u)(1 − F s (u)) (s) , and U H standing for the tail quantile function pertaining to the standard GPD, that is U H (t) ∶= ( 1 1 − H (s),0,1 ) ← (t) = t (s) − 1 (s) , for all t ≥ 1 (the left arrow indicates the left-continuous inverse). The representation (A4) facilitates the view that, in practice, changes in the threshold (e.g., through covariates) will be reflected in the scale parameter. In turn, the approach for inference is reflected in the way we go choose to go about the term 1 − F s (u) for this to become statistically meaningful. In order to able to perform large sample inference drawing on the POT-GPD framework streamlined above, we now make the threshold dependent on the sample size n and u = u(n) will naturally become larger as n goes to infinity. A parametric approach typically advocates for a large enough threshold to be fixed and inference to be conducted on the basis of the resulting POT framework, whereby the expected number of exceedances above the selected threshold is a random number K s , say, satisfying (n/K s )(1 − F s (u n )) → 1 in probability, as n → ∞. This suggests estimation of (1 − F s (u)) via the analogous tail empirical distribution function (stepping up by 1/n at each observation) evaluated at u, adding up to 1 − K s /n in the above, associated with the random number K s of exceedances of u at direction (or location) s. Hence, for a given (fixed) k s , the location and scale parameters in (A4) become: Therefore, the crux of parametric inference for extremes values lays in the estimation of the shape and scale parameters, respectively, (s) and s = * (u).

(ii) Semiparametric approach
It will be notationally cleaner to express the argument (A1) in terms of the pertaining tail quantile function U s ∶= (1∕(1 − F s )) ← . Note that U s (t) is nondecreasing and provides a straightforward link to an extreme quantile, with the right endpoint representing the ultimate quantile: lim t→∞ U s (t) = U s (∞) = x F s . To this effect, we make t in (A1) depend on the (possibly unknown) sample size n at each location s through replacing it by U s (n/k s ), where k s is an intermediate sequence of positive integers such that k s = k s (n) → ∞ and k s /n → 0, as n → ∞. This is possible because (A1) holds uniformly in x. Hence, we have for the left-hand side of (A1): with a ⋆ s (n∕k s ) = a s (U s (n∕k s )). For simplicity, we consider regularly spaced independent vectors (X 1 (s), X 2 (s), … , X n (s)), s = 1, 2, … , m, … with i.i.d. components, with partial tally of N = n × m ∈ N observations across the whole system, and where n is potentially unknown (without affecting inference on extremes), yet assumed large (n → ∞). In this setting, the basic extreme value condition is (cf., (A3)): for all x with 1 + (s)x > 0, uniformly in s = 1, 2, … , with > 0 a continuous and smooth function satisfying (1∕m) ∑ m s=1 (s) → 1, as m → ∞, that is, the resulting sequence of weights { (m)} m∈N is Cesàro summable. The latter is to maintain integrity, also ensuring that the stationary case is well defined. In particular, the case of complete stationarity, corresponding to omni-directional data in the context of this article, is recovered if (s) is identically one over the stipulated range for s. The interest lies in the estimation of the various extreme value indices (s), and the scale and location terms, respectively a ⋆ s (N∕k s ) and U s (N/k s ), now with k s ∶= [ (s) × k] and [•] standing for integer part. Since the Nth-order statistic X N−k s ∶N (s) is close to U s (N/k s ), if k s → ∞, k s /N → 0, as N → ∞, we shall adopt it, as the usual estimator for the thresholdÛ s (N∕k s ) = X N−k s ∶N (s). The random adaptive threshold in this setting emulates the nonstationarity mirrored in the scale s > 0 which features the parametric setting (i).
The role of the of the (deterministic) function can perhaps be better understood if the domain of attraction condition (A5), characterizing nonstationary POT, is formulated in terms of their respective inverse functions intervening in both hand-sides. To this end, we use t → ∞ in place of N/k → ∞, so that (A5) translates into: ))} = (s) ( 1 − H (s),0,1 (x) ) , which, by inversion in terms of t∕y = 1∕ { 1 − F s ( U s (t∕ (s)) + x a ⋆ s (t∕ (s)) )} , and similarly for y∕ (s) = (1 + (s)) −1∕ (s) on the right-hand side, yields the desired condition:

B.3 Estimation of T-year level and finite right endpoint
The procedure for estimation of an extreme level and of the right endpoint, if appropriate, for any of the three methods is summarized in the following algorithm for clarity. The algorithm assumes that the nonstationary threshold u( ), ∈  has been predetermined through Algorithm 1 in connection with the adopted local estimators for .
Algorithm 3. Extreme quantile (including T-year level) and finite right endpoint estimation using M, local ML, and spline ML based estimators