Extreme Value Theory and Statistics of Univariate Extremes: A Review



Statistical issues arising in modelling univariate extremes of a random sample have been successfully used in the most diverse fields, such as biometrics, finance, insurance and risk theory. Statistics of univariate extremes (SUE), the subject to be dealt with in this review paper, has recently faced a huge development, partially because rare events can have catastrophic consequences for human activities, through their impact on the natural and constructed environments. In the last decades, there has been a shift from the area of parametric SUE, based on probabilistic asymptotic results in extreme value theory, towards semi-parametric approaches. After a brief reference to Gumbel's block methodology and more recent improvements in the parametric framework, we present an overview of the developments on the estimation of parameters of extreme events and on the testing of extreme value conditions under a semi-parametric framework. We further discuss a few challenging topics in the area of SUE. © 2014 The Authors. International Statistical Review © 2014 International Statistical Institute

1 The Role of Extremes in Society and Scope of the Paper

Statistics of univariate extremes (SUE) has been successfully used in the most diverse fields, such as finance, insurance and risk theory, where the value at risk at any level p (the size of the loss that occurred with small probability p) and the adjustment coefficient (a rudimentary measure of risk in a collective of insurance risks) are important parameters of extreme or even rare events. Also, in fields like biology and environment, the Weibull tail coefficient, the regular variation coefficient of the inverse failure rate function, probabilities of exceedance of high levels and endpoints of underlying models (lifetime of human beings or ultimate records in the field of athletics) are relevant extreme events' parameters or functionals. Statistical problems in all these areas have direct ethical, social, economic and environmental impact, and this is one of the reasons that statistics of extremes, in general, and SUE, in particular, have faced a huge development in the last decades. Indeed, rare events can have catastrophic consequences for human activities, through their impact on the natural and constructed environments. The recent development of a sophisticated methodology for the estimation and prediction of functionals of rare events has contributed to saving endangered natural resources and to modelling climate, earthquakes and other environmental phenomena, like precipitation, temperature and floods, situations where we have to deal with large risks or with very low probabilities of overpassing (underpassing) a high (low) level. From a theoretical point of view, the key results obtained by Fisher & Tippett 1928 on the possible limiting laws of the sample maxima of a random sample (X1,…,Xn) of independent and identically distributed (IID) random variables (RVs), formalised by Gnedenko 1943 and used by Gumbel 1958 for applications of extreme value (EV) theory (EVT) in engineering subjects, are some of the key tools that led to the way statistical EVT has been exploding in the last decades. The statistical applications of EVT gave emphasis to the relaxation of the independence condition, to the consideration of multivariate and spatial frameworks and to an increasing use of regular variation and point process approaches. These topics are well documented in books by David 1970, Galambos 1978, Leadbetter et al., 1983, Resnick 1987, Arnold et al., 1992, Falk et al., 1994, Embrechts et al., 1997, Reiss & Thomas 1997, Coles 2001, David & Nagaraja 2003, Beirlant et al., 2004, Castillo et al., 2005, de Haan & Ferreira 2006, Resnick 2007, Markovich 2007 and their subsequent editions. For an overview of most of the topics in this field, see the recent volumes of Extremes 11:1 (2008) and Revstat 10:1 (2012). Among review papers in the area of SUE, we mention Gomes et al., 2008a, 2007b, Neves & Fraga Alves 2008, Hüsler & Peng 2008, Beirlant et al., 2012 and Scarrot & McDonald 2012.

In Section 2 of this review paper, we provide some details related to the non-degenerate limiting behaviour of the sequence of maximum values, other top order statistics (OSs) and excesses over high thresholds. Section 3 is dedicated to the most common parametric models in SUE and to a discussion of the more recent semi-parametric frameworks. Estimation procedures are discussed in Sections 4, 5 and 6. In Section 7, we briefly discuss SUE for censored data. Section 8 is dedicated to a brief reference to testing issues either under parametric or semi-parametric frameworks. In Section 9, we make a brief reference to the estimation of the extremal index. Finally, in Section 10, we mention a few other relevant topics in the area and a few important open problems that we think useful to be dealt with in the nearby future.

2 Probabilistic Backgrounds of EVT—Limiting Structure of Maxima, Top OSs and Excesses Over High Thresholds

The field of EVT goes back to 1927, when Fréchet 1927 formulated the functional equation of stability for maxima, which later was solved with some restrictions by Fisher & Tippett 1928 and finally by Gnedenko 1943 and de Haan 1970. Let us assume that we have access to a sample, (X1,…,Xn) of IID or even stationary and weakly dependent RVs from an underlying model F, and let us denote by (X1:n≤⋯≤Xn:n) the sample of associated ascending OSs.

2.1 Limiting Behaviour of the Sequence of Maximum Values

Let us further assume that it is possible to linearly normalise the sequence of maximum values, {Xn:n}xx-xxn≥1, so that we obtain a non-degenerate limit for the sequence (Xn:nbn)/an, with an>0 and math formula. Then, Gnedenko's extremal type theorem (ETT) assures us that such a limiting RV has a cumulative distribution function (CDF) of the type of the EV distribution (EVD) given by

display math(2.1)

where ξ is the so-called EV index (EVI), the primary parameter in SUE. We then say that F is in the max domain of attraction (MDA) of EVξ, in (2.1), and use the notation math formula. The parameter ξ measures essentially the weight of the right tail function (RTF),

display math

If ξ < 0, the right tail is short and light; that is, xF:= sup{x:F(x) < 1}, the right endpoint of F, is finite. This class is called the Weibull class and contains among others the uniform and reverse Burr CDFs. If ξ > 0, the right tail is heavy, of a negative polynomial type, and F has an infinite right endpoint. Examples in this class (Fréchet class) are the Pareto, Burr, Student, α-stable (α < 2) and log-gamma CDFs. If ξ = 0, the right tail is of an exponential type, and the right endpoint can then be either finite or infinite. This class (Gumbel class) encompasses the exponential, normal, lognormal, gamma and classical Weibull CDFs, with an infinite right endpoint, but also models with a finite right endpoint, like math formula, for x < xF, δ > 0, and C > 0.

In Gnedenko's pioneering paper and also nowadays in many applications of EVT, the EVD, in (2.1), is often rewritten on the three domains of attractions as follows:

display math(2.2)

The CDFs, either in (2.1) or in (2.2), are appropriate when the data consist of a set of maxima.

2.2 Joint Limiting Behaviour of Top OSs

Apart from the ETT and the already mentioned EVD, in (2.1), it is also sensible to mention the multivariate EV (MEV) CDF, related to the limiting distribution of the k largest values Xni + 1:n,1≤ik, also called the extremal process (Lamperti, 1964; Dwass, 1964), with associated probability density function

display math(2.3)


display math

2.3 Limiting Structure of the Excesses Over a High Threshold

Just as mentioned earlier, rather than the maxima, we can consider all values larger than a given threshold. The differences between these values and a given threshold are called exceedances over the threshold. These exceedances are typically assumed to have a GPξ(x/σ)=:GPξ,σ(x) CDF, σ > 0, with GPξ as the generalised Pareto distribution (GPD), defined by

display math(2.4)

This distribution is generalised in the sense that it subsumes certain other distributions under a common parametric form. In the GPξ,σ(x)CDF,ξ is the important shape parameter of the distribution and σ is an additional scaling parameter.

The distribution of excesses over a high threshold u is defined to be

display math

for 0≤xxFu. Depending on the field of applications, different names for Fu arise, for instance excess life or residual lifetime in reliability or medical statistics, excess of loss in an insurance framework. According to Balkema & de Haan 1974 and Pickands 1975, the key result in EVT that explains the importance of the GPD is that math formula if and only if

display math

for some positive function σ(u). Thus, the GPD is the natural model for the unknown excess distribution above sufficiently high thresholds. This approach based on the GPD approximation is called the peaks-over-threshold (POT) approach.

2.4 First-order, Second-order and Higher-order Frameworks

We further use the notations

display math

for the generalised inverse function of math formula for the class of regularly varying functions at infinity with an index of regular variation ‘a’, that is, positive Borel measurable functions g(·) such that g(tx)/g(t)→xa, as t, for all x > 0 (see Bingham et al., 1987, for details on regular variation), and

display math

for the tail quantile function.

A necessary and sufficient condition for math formula, provided by de Haan 1984, is the following

display math(2.5)

for all x > 0, where a is a positive measurable function. Condition (2.5) is known as the first-order condition. For instance, in the Fréchet MDA, we can choose a(t) = ξU(t) and thus math formula if and only if math formula, x > 0, which means math formula or equivalently math formula.

However, a first-order condition is in general not sufficient to study properties of tail parameters' estimators, in particular asymptotic normality. In that case, a second-order condition is required by specifying the rate of convergence in (2.5). Different types of conditions exist, some expressed in terms of either math formula or U or lnU. The most common one is the following

display math(2.6)

x > 0, where ρ≤0 is a second-order parameter and A is a function possibly not changing in sign and tending to 0 as t, such that math formula (de Haan & Stadtmüller, 1996). Note that in both (2.5) and (2.6), we could have provided a simpler limit by using the interpretation of the Box–Cox function as the logarithm when the power equals 0. Again, the rate of convergence in (2.6) can also be specified in a third-order condition (Gomes et al., 2002; Fraga Alves et al., 2003a, 2003b, 2006). Other limiting results in EVT, important for the development of parametric approaches to SUE can be found in Gomes et al., 2008a.

3 Models in SUE

Statistical inference about rare events is clearly linked to observations that are extreme in some sense. There are different ways to define such observations, and such definitions lead to different alternative approaches to SUE.

3.1 Gumbel's Approach or BM Method

When the sample size n, and because of the limiting result given before for the normalised sequence of maximum values, that is, the ETT, we can write

display math(3.1)

with EVξ(x) given in (2.1) and math formula an unknown vector of location and scale parameters that replaces the attraction coefficients (bn,an) in the normalised sequence of maximum values, (Xn:nbn)/an.

Remark 1. Any result for maxima (top OSs) can be easily reformulated for minima (low OSs). Indeed, min1≤inXi=− max1≤in(−Xi), and consequently, when n,

display math

with math formula, for 1 − ξx > 0 and math formula.

The aforementioned ETT was used by Gumbel in several papers, which culminated in his 1958 book, to give approximations of the type of the one provided in (3.1) but for any of the models in ((2.2)). He suggested the first model in SUE, usually called the annual maxima or block maxima (BM) model or the EV univariate model or merely Gumbel's model. Under Gumbel's model, the sample of size n is divided into k sub-samples of size r (usually associated with k years, with n = r × k and reasonably large r). Next, the maximum of the r observations in each of the k sub-samples is considered, and one of the extremal models in ((2.2)), obviously with extra unknown location and scale parameters, is fitted to the sample of those k maximum values. Nowadays, whenever this approach is used, which is still quite popular in environmental sciences, it is more common to fit to the data an EVD, EVξ((xλr)/δr), with EVξ given in (2.1) and math formula unknown location, scale and ‘shape’ parameters. All statistical inference is then related to the aforementioned models.

3.2 Multivariate and Multidimensional EV Approaches: The Method of Largest Observations

Although Gumbel's statistical procedure has proven to be fruitful in the most diverse situations, several criticisms have been made on Gumbel's technique, and one of them is the fact that we are wasting information when using only observed maxima and not further OSs, if available, because they certainly contain useful information about the RTF underlying the data. On the other hand, in most areas of application, there is no natural seasonality of the data, and in such framework, the method of sub-samples is subjective and artificial.

To infer on the right-tail weight of the underlying model, it seems sensible to think of a small number k of top OSs from the original data. Indeed, if we have daily data, some years may have several values among those top OSs (which are for sure relevant for making inference upon the RTF), and other years may contain none of those top values. We can thus say that such an approach provides additional information, which has been disregarded in the traditional Gumbel methodology.

This approach depends on the joint limiting distributional behaviour of those top OSs. When the sample size n is large and for a fixed k, it is sensible to consider, on the basis of the probability density function hξ defined in (2.3), the approximation

display math

where Hξ is the EV (MEV) CDF, CDF associated with hξ and λn and δn are unknown location and scale parameters, respectively, to be estimated on the basis of the k top OSs in the sample of size n. This approach to SUE is the so-called MEV model or largest observations (LOB) method or extremal process. Under this approach, it is easier to increase the number k of observations, contrary to what happens in Gumbel's approach, where a larger number n of original observations is usually needed. Such an approach has been introduced first, in a slightly different context, by Pickands 1975 and was used by Weissman 1978 and Gomes 1978, 1981.

Note finally that it is easy to combine both approaches. In each of the sub-samples associated with Gumbel's classical approach, we can collect a few top OSs modelled through a MEV model and next consider the multidimensional EV model, based on the multivariate sample

display math

are MEV vectors.

3.3 The POT Approach

Another approach to SUE, in a certain sense parallel to the MEV model, is the one in which we restrict our attention only to the observations that exceed a certain high thresholdu, fitting the appropriate statistical model to the excesses over such a high level u. From the results in Section 2, we obtain the approximation

display math

with GPξ(·) as the GPD in (2.4). We are then led to consider a deterministic high-level u and work with the excesses. The adequate model is then the GPD. Such a model is the so-called Paretian excess model or POT model and was introduced by Smith 1987a. Here, all statistical inference is related to the GPD. Despite the existence of an extra scale parameter σ = σ(u), the POT model can also be regarded as a semi-parametric model whenever we work in the MDA of EVξ.

3.4 Bayesian Approaches

It has recently become more and more common to use Bayesian methods within EV analysis, as can be seen in the monographs by Reiss & Thomas 1997, by Coles 2001, by Beirlant et al., 2004 and references therein.

3.5 Summary of Parametric Approaches and a Link to Semi-parametric Frameworks

More recently, the LOB and POT methodologies have been considered under a semi-parametric framework. There is then no fitting of a specific parametric model, dependent upon a location parameter λ, a scale parameter δ and a ‘shape’ parameter ξ. It is merely assumed that math formula, with EVξ given in (2.1) and ξ being the unique primary parameter of extreme events to be estimated, on the basis of a few top observations and according to adequate methodology, to be dealt with in Section 5. We now summarise the different approaches to SUE here discussed:

  1. Parametric approaches:
    1. The univariate EV model (for the k maximum values of sub-samples of size r,n = r × k.) (Gumbel's classical approach or BM method)
    2. The MEV model or LOB method (for the k top OSs associated with the original sample of size n)
    3. The multidimensional EV model (MEV model for the ijtop observations, j = 1,2,…,m, in sub-samples of size r,m × r = n)m = kand ij=1for 1≤jkoriginate I;m = 1(r = n) and i1=koriginate II.
    4. The Paretian model for the excesses, Xju, given Xj>u, 1≤jk, of a high deterministic threshold u, suitably chosen (POT approach)
    5. Bayesian approaches
  2. Semi-parametric approaches:
    • VI Under these approaches, we work with the k top OSs associated with all n observations or with the excesses over a high deterministic or random threshold, assuming only that the model F underlying the data is in math formulaor in specific sub-domains of math formula, with EVξ(·)provided in (2.1). The POT approach can thus be considered under this framework.

4 Estimation Under Parametric Frameworks

We have now several R-Packages for Extreme Values, such as evd, evdbayes, evir, ismev, extRemes, extremevalues, fExtremes, lmom, lmomRFA, lmomco, POT and SpatialExtremes, among others, that can help us in most of the inferential procedures given in the following.

4.1 Gumbel's Approach or Block Maxima Method

Computational details on maximum likelihood (ML) estimation of (λ,δ,ξ) in the EV model, EVξ((xλ)/δ), with EVξ(x) given in (2.1), can be found in the works of Prescott & Walden 1980, 1983, Hosking 1985, Smith 1985 and Macleod 1989, among others.

As the ML estimators can be numerically difficult to handle, several alternative methods have been proposed for the estimation of (λ,δ,ξ). The probability weighted moment (PWM) method, introduced by Landwher et al., 1979 and Greenwood et al., 1979 is an interesting alternative to the ML approach. The main idea of this method is to match the moments

display math

with their empirical versions, similar to the classical method of moments. For the EVD, Hosking et al., 1985 show that math formula can be explicitly computed, which leads to the PWM estimation of the parameters under play and, just like the ML approach, implies some restriction on the value of ξ.

Several other estimation methods for the EVD can be found in the literature. Among them, we mention the following: best linear unbiased estimation (Balakrishnan & Chan, 1992), method of moments (Christopeit, 1994) and minimum distance estimation (Dietrich & Hüsler, 1996).

Robust methods for the EVD have been studied by Dupuis & Field 1998, who derived B-optimal robust M-estimators for the case that the observations follow an EVD. Modifications of the ML estimator are presented by Coles & Dixon 1999, who suggest penalised ML (PML) estimators, showing that PML estimation improves the small-sample properties of a likelihood-based analysis. The BM method has been recently revisited by Dombry 2013 and Ferreira & de Haan 2013.

4.2 Multivariate and Multidimensional EV Approaches: The Method of LOB

For estimation procedures under these MEV approaches, see Weissman 1984, Smith 1986, Tawn 1988 and references therein. ML estimators of the unknown parameters in the multidimensional EV model have been studied by Gomes 1981. See also Smith 1984 and references therein. The use of concomitants of OSs to deal with statistical inference techniques in this model appears in Gomes 1984, 1985a. A comparison of the MEV model and the multidimensional EV model is performed by Gomes 1985b, 1989a. Discrimination among MEV models can also be found in Fraga Alves 1992.

4.3 The POT Approach

Maximum likelihood estimates of ξ and σ in a GPD, non-regular for ξ <− 1/2, have been studied by Smith 1987a. A survey of the POT methodology, together with several applications can be found in Davison & Smith 1990.

Just as for the EVD, as the ML estimators can be numerically hardly tractable (Grimshaw, 1993), there have been several methods, other than ML, proposed for the estimation of ξ and σ. Hosking & Wallis 1987 suggest the use of PWM estimators (see also the comparative study of Singh & Guo, 1997). Castillo & Hadi 1997 have proposed estimators based on the elemental percentile method. PML estimators, containing a penalty function for the shape parameter, are presented by Coles & Dixon 1999 and Martins & Stedinger 2000. The PML estimator combines the flexibility of the ML estimator and the robustness of the PWM estimator. See also Resnick 1997, Crovella & Taqqu 1999 and references therein. The estimates depend significantly on the choice of the threshold, and several authors, among whom we mention McNeil 1997 and Rootzén & Tajvidi 1997, explicitly state that the selection of an appropriate threshold u, above which the GPD assumption is appropriate, is a difficult task in practice.

Robust estimation for the GPD was first addressed by Dupuis 1998, who provides the optimally biased robust estimator for the GPD and suggests a validation mechanism to guide the threshold selection. Peng & Welsh 2001 use the method of medians introduced by He & Fung 1999 and obtain estimators of the unknown parameters of the GPD with bounded influence functions. Juárez & Schucany 2004 implement the minimum density power divergence estimator (MDPDE), originally introduced by Basu et al., 1998, for the shape and scale parameters of the GPD. The MDPDE is indexed by a non-negative constant that controls the trade-off between robustness and efficiency. Frigessi et al., 2002 suggest an unsupervised alternative to the classical POT model, where a GPD is fitted beyond a threshold, which is selected in a supervised way. They suggest modelling the data with a dynamical mixture: one term of the mixture is a GPD and the other is a light-tailed density distribution. The weight of the GPD component is predominant for large values and takes the role of threshold selection. A recent comparison between the BM approach (with several block sizes, not only annual maximum values) and the POT approach has been performed by Engeland et al., 2004.

4.4 Bayesian Approaches

We begin with some examples of the most direct use of Bayesian methodology within EV analysis: Smith & Naylor 1987 compare Bayesian and ML estimators for the Weibull distribution; Ashour & El-Adl 1980, Lingappaiah 1984, Achcar et al., 1987 and Engelund & Rackwitz 1992 consider estimation for specific extremal types for maxima; Lye et al., 1993 consider estimation of the EVD; Pickands 1994 and de Zea Bermudez & Amaral-Turkman 2003 consider estimation of the GPD.

Smith 1999 discusses predictive inference aspects of Bayesian and frequentist approaches, and de Zea Bermudez et al., 2001 propose the use of a Bayesian predictive approach for the choice of the threshold, through a hierarchical Bayesian model. Stephenson & Tawn 2004 claim that in practice, the appropriate asymptotically motivated extremal model, either EVD or GPD, fitted to data that can be regarded as maxima or exceedances of a high threshold, reduces the Gumbel (exponential) type to a single point in the parameter space, and consequently, the Gumbel (exponential) model is never selected. They then decide to incorporate knowledge of the structure of the ETT into inference for the EVD and the GPD. To do this, they associate the probability pξ to the parameter subspace corresponding to the Gumbel (exponential) type. This approach requires an inference scheme that allows switching between the full EVD (GPD) and the Gumbel (exponential) sub-model. They then perform inference using reversible jump Markov chain Monte Carlo techniques. This Bayesian approach recognises the possibility that the data can come from any of the three extremal types. As a by-product of the analysis, posterior probabilities for math formula and math formula are obtained.

Diebolt et al., 2005 propose a quasi-conjugate Bayesian inference approach for the GPD with ξ > 0, through the representation of a heavy-tailed GPD as a mixture of an exponential distribution and a gamma distribution. For other papers on Bayesian approaches to, for instance, high-quantile estimation, see, for example, Coles & Powell 1996 and Coles & Tawn 1996, who provide a detailed review of Bayesian methods in EV modelling up to this date. See also Reiss & Thomas 1999, Walshaw 2000, Smith & Goodman 2000, Bottolo et al., 2003 and Tancredi et al., 2006, among others.

5 Semi-parametric EVI Estimation

Just as mentioned earlier, under a semi-parametric framework, we do not need to fit a specific parametric model based on scale, shape and location parameters, but only to assume that F is in the MDA, math formula, for a suitable index ξ. Then we construct an estimator for this index based on the k-LOB in the sample X1,…,Xn, where k needs to be an intermediate sequence, that is, such that k,k = o(n) as n. We start by reviewing the most classical estimators proposed in the literature. Then we give their main asymptotic properties and briefly discuss reduced-bias estimation and the choice of k.

5.1 Classical Estimators

5.1.1 Hill estimator

The most famous estimator of ξ > 0 is the Hill 1975 estimator

display math(5.1)

where k = k(n)→ in an appropriate way, so that an increasing sequence of upper OSs is used. One of the interesting facts concerning (5.1) is that various asymptotically equivalent versions of math formula can be derived through essentially different methods (such as the ML method or the mean excess function approach), showing that the Hill estimator is very natural. This estimator is based on the assumption that the RTF is of Zipf or Pareto form for large x, that is, 1 − F(x) ∼ Cx−1/ξ as x, for some ξ > 0 and C > 0. Hence, Hill's estimator is only applicable in case the EVI is known to be positive, that is, only in case the underlying CDF exhibits a heavy tail. Another reason for the success of this estimator is the fact that it can be interpreted as an estimator of the slope of the Pareto quantile plot, which is a graphical tool for testing whether the data are Pareto distributed. Indeed, as log-transformed Pareto distributed RVs are exponentially distributed, one can visually check the hypothesis of a strict Pareto behaviour by looking at the scatterplot with coordinates (log((n + 1)/j), logXnj + 1:n).

5.1.2 Kernel estimator

A general class of kernel estimators was given by Csörgő et al., 1985:

display math

This estimator is also restricted to the case ξ > 0. The Hill estimator is a member of this class, as it can be obtained by taking a uniform kernel K(x). The kernel estimators can be interpreted as weighted least squares regression estimators of the slope of the Pareto quantile plot in case one considers regression lines passing through a fixed anchor point. Kernel EVI estimators for a real ξ have been studied by Groeneboom et al., 2003.

5.1.3 Pickands estimator

A simple estimator for the general case, math formula, is the Pickands 1975 estimator

display math

where ⌊x⌋ denotes the integer part of x.

5.1.4 Moment estimator

Dekkers et al., 1989 have proposed an alternative estimator that is not restricted to the case ξ > 0 and that has the following form

display math


display math

5.1.5 Generalised Hill estimator

Beirlant et al., 1996b proposed an estimator for math formula by estimating the slope of the generalised quantile plot, (log((n + 1)/j), logUHj,n),j = 1,…,n, where math formula. This leads to the generalised Hill (GH) estimator defined as

display math

further studied by Beirlant et al., 2005.

5.1.6 ML estimator

Conditional on the OS, Xnk:n,k intermediate, Xni + 1:nXnk:n,1≤ik, are approximately the k top OSs associated with a sample of size k from math formula, with GPξ(x) given in (2.4). The solution of the ML equations associated with the aforementioned parametrisation (Davison, 1984) gives rise to an explicit EVI estimator, usually called the ML EVI estimator, which can be named PORT-ML, with PORT standing for peaks over random threshold, after Araújo Santos et al., 2006. Such an EVI estimator is given by

display math

where math formula is the implicit ML estimator of the unknown ‘scale’ parameter α. A comprehensive study of the asymptotic properties of this ML estimator for ξ >− 1/2 has been undertaken by Drees et al., 2004. As recently shown by Zhou 2009, 2010, such an EVI estimator is also valid for ξ >− 1. We can also consider the random threshold Xnk:n replaced by a deterministic threshold u, working under the POT methodology.

5.1.7 PWM estimators

The parametric PWM method, initially derived under parametric frameworks and sketched in Section 4.1, can be devised under a semi-parametric framework. de Haan & Ferreira 2006 considered the semi-parametric generalised Pareto PWM (GPPWM) EVI estimator, based on the sample of excesses over the high random level Xnk:n, that is,

display math

with math formula. See also Diebolt et al., 2008c, 2007. Caeiro & Gomes 2011 introduced and studied Pareto PWM (PPWM) EVI estimators, given by

display math

with math formula.

5.1.8 Mixed-moment estimator

We further refer to the so-called mixed-moment (MM) estimator (Fraga Alves et al., 2009), valid for all math formula and given by

display math

math formula given in (5.1) and math formula. The MM EVI estimator has a very simple form and is asymptotically very close to the ML EVI estimator for a wide class of heavy-tailed models.

5.1.9 The mean-of-order-p EVI estimator

A competitive generalisation of the Hill estimator has been recently introduced in the literature. Note that we can write

display math

The Hill estimator is thus the logarithm of the geometric mean (or mean of order 0) of math formula More generally, Brilhante et al., 2013 considered as basic statistics the mean of orderp (MOP) of math formula, with p≥0, that is, the class of statistics

display math

and the class of MOP EVI estimators,

display math

with math formula, given in (5.1). See also Beran et al., 2014.

5.1.10 The PORT EVI estimators

Apart from Pickands and ML estimators, all aforementioned EVI estimators are scale invariant but not location invariant. And particularly, the Hill estimator can suffer drastic changes when we induce a shift in the data, giving rise to the so-called Hill horror plots, a terminology used by Resnick 1997. This led Araújo Santos et al., 2006 to introduce the so-called PORT methodology. The estimators are then functionals of a sample of excesses over a random level math formula, that is, functionals of the sample

display math(5.2)

Generally, we can have 0 < q < 1, for any math formula (the random level is an empirical quantile). If the underlying model F has a finite left endpoint, math formula, we can also use q = 0 (the random level can then be the minimum). If we think, for instance, of Hill EVI estimators, the new classes of PORT–Hill EVI estimators, theoretically studied by Araújo Santos et al., 2006 and for finite samples by Gomes et al., 2008b, are given by

display math(5.3)

And a similar dependence on this extra tuning parameter q can be conceived for any other EVI estimator. See also Gomes et al., 2013, 2011. Other interesting location-invariant EVI estimators for Pareto-type tails can be found in Fraga Alves 2001 and Ling et al., 2012.

5.2 Main Asymptotic Properties

In order to obtain weak consistency of all these estimators, we only need to assume that F belongs to the MDA, math formula, and that k is an intermediate sequence. However, if we want to obtain asymptotic normality, we need to strengthen this condition into a second-order one, of the form in (2.6). More precisely, assuming that math formula, we can expand each estimator math formula as follows

display math

where Nk is a standard normal RV and math formula. Thus, with math formula denoting a normal RV with mean value μ and variance σ2,

display math(5.4)

5.3 Reduced-bias Estimators

The adequate accommodation of the high asymptotic bias of some of the aforementioned EVI estimators has recently been extensively addressed. We mention the pioneering papers by Peng 1998, Beirlant et al., 1999, Feuerverger & Hall 1999 and Gomes et al., 2000, among others, based on log-excesses and on scaled log-spacings between subsequent extreme OSs from a Pareto-type distribution. In these papers, authors are led to propose second-order reduced-bias (SORB) EVI estimators, with asymptotic variances larger than or equal to (ξ(1 − ρ)/ρ)2, where ρ(<0) is the aforementioned ‘shape’ second-order parameter, ruling the rate of convergence of the distribution of the normalised sequence of maximum values towards the limiting law EVξ, in (2.1). Similar results in the general MDA, math formula, can be found in Beirlant et al., 2005 and more recently in Cai et al., 2013, who introduced a SORB EVI estimator for ξ around zero, based on the PWM methodology. For those estimators, the asymptotic mean is 0, instead of the value λb, in (5.4), whatever the value of λ. However, the asymptotic variance increases compared with σ⋆2.

For Pareto-type models, Caeiro et al., 2009, 2005 and Gomes et al., 2008c, 2007a have been able to reduce the bias without increasing the asymptotic variance, kept at ξ2, just as what happens with the Hill EVI estimators. Those estimators, called minimum-variance reduced-bias (MVRB) EVI estimators, are all based on an adequate ‘external’ and a bit more than consistent estimation of the pair of second-order parameters, math formula in A(t) = ξβtρ, performed through consistent estimators, denoted by math formula, such that math formula, and outperform the classical estimators for all k. Different algorithms for the estimation of (β,ρ) can be found in Gomes & Pestana 2007, among others. Among the most common MVRB EVI estimators, we just mention the simplest class in Caeiro et al., 2005. Such a class has the functional form

display math(5.5)

with math formula as the Hill EVI estimator, in (5.1), and where math formula is an adequate consistent estimator of the aforementioned vector of second-order parameters (β,ρ). Note that the MVRB EVI estimator in (5.5) is easily justified by the fact that in (5.4), bH=1/(1 − ρ). For recent overviews on reduced-bias EVI estimation, see Gomes et al., 2007b, Chapter 6 in Reiss & Thomas 1997, Gomes et al., 2008a and Beirlant et al., 2012.

5.4 Choice of k

An important problem has often hampered a wide practical use of these estimators: the number k used in the implementation of these estimators depends strongly on the tail itself and needs to be estimated adaptively from the data. The choice of this number is clearly a question of trade-off between bias and variance: as k increases, the bias will grow because the tail satisfies less the convergence criterion, while if less data are used, the variance increases. It is therefore typically suggested that the optimal value of k should coincide with the value that minimises the mean-squared error (the sum of the bias-squared and the variance). However, it has been theoretically shown that this optimum depends on both the sample size and the unknown values of ξ and ρ (see Hall & Welsh, 1985, among others). Therefore, some authors have suggested to plot the value of the ξ estimator as a function of k and to judgementally choose a ‘stable’ point (see Drees et al., 2000; de Sousa & Michailids, 2004, among others). Nevertheless, this stable part is actually lacking in many cases, and the estimation problem turns into a guessing practice. Other approaches based on bootstrap methods (Hall, 1990; Draisma et al., 1999; Danielsson et al., 2001; Gomes & Oliveira, 2001) or on regression diagnostics on a Pareto quantile plot (Beirlant et al., 1996c; 2002) have been proposed. We further mention interesting selection procedures based on bias, like the ones in Drees & Kaufmann 1998 and Guillou & Hall 2001. Possible heuristic choices are provided by Gomes & Pestana 2007, Gomes et al., 2008e and Beirlant et al., 2011. The adaptive SORB and MVRB EVI estimation is still giving its first steps. We can however mention the recent papers by Gomes et al., 2012, 2013.

6 Semi-parametric Estimation of Other Parameters

6.1 Classical Semi-parametric Estimation

Despite the fact that the estimation of quantiles, return periods of high levels and exceedance probabilities, among other parameters of extreme events, is at least as important in applications as the estimation of the EVI, we shall only briefly refer to this topic.

6.1.1 High quantile or VaR estimation

In a semi-parametric framework, the most usual estimators of a quantile VaRp=χ1 − p:=U(1/p), with p small, can be easily derived from (2.5), through the approximation

display math

The fact that math formula enables us to estimate χ1 − p on the basis of this approximation and adequate estimates of ξ and a(n/k). For the simpler case of heavy tails, the approximation turns out to be

display math

and we have

display math

where math formula is any consistent semi-parametric EVI estimator. This estimator is of the type introduced by Weissman 1978. Details on semi-parametric estimation of extremely high quantiles for math formula can be found in Dekkers & de Haan 1989, de Haan & Rootzén 1993, Ferreira et al., 2003 and more recently de Haan & Ferreira 2006. Fraga Alves et al., 2009 also provide, jointly with the MM estimator, accompanying shift and scale estimators that make high-quantile estimation almost straightforward. Other approaches to high-quantile estimation can be found in Matthys & Beirlant 2003. None of the aforementioned quantile estimators reacts adequately to a shift of the data. Araújo Santos et al., 2006 provide a class of semi-parametric VaRp estimators that enjoy such a feature, the empirical counterpart of the theoretical linearity of any quantile χα,χα(δX + λ) = δχα(X) + λ, for any real λ and positive δ. This class of estimators is based on the PORT methodology, providing exact properties for risk measures in finance: translation equivariance and positive homogeneity. For Pareto-type models and a Hill EVI estimation, they are given by

display math

with math formula as the PORT–Hill EVI estimator in (5.3).

6.1.2 Probability of exceedance estimation

The estimation of the probability of exceedance of a fixed high level is the dual problem of estimation of a high quantile. It has been dealt with by Dijk & de Haan 1992 and Ferreira 2002, among others. Again for Pareto-type underlying models, we have for the probability of exceedance of a high level x = xn,

display math

See also Caeiro et al., 2014.

6.1.3 Estimation of other parameters

The estimation of the endpoint of an underlying CDF has been studied by Hall 1982, Csörgő & Mason 1989, Aarssen & de Haan 1994, among others. We further refer to the recent article by Fraga Alves & Neves 2013, dealing with the endpoint semi-parametric estimation of a model in math formula. Estimation of the mean of a heavy-tailed distribution has been undertaken by Peng 2001 and Johansson 2003. Estimation of the conditional tail expectation can be found in Deme et al., 2014. Estimation of the Weibull tail coefficient dates back to Girard 2004. See also Goegebeur et al., 2010, among others.

6.2 SORB Semi-parametric Estimation

Reduced-bias quantile estimators have been studied by Matthys et al., 2004 and Gomes & Figueiredo 2006, who consider the classical SORB EVI estimators. Gomes & Pestana 2007 and Beirlant et al., 2008 incorporate the MVRB EVI estimators in Caeiro et al., 2005 and Gomes et al., 2007b in high-quantile semi-parametric estimation. See also Diebolt et al., 2008b, Beirlant et al., 2009, Caeiro & Gomes 2009 and Li et al., 2010. For a SORB estimation of the Weibull-tail coefficient, we mention Diebolt et al., 2008a. Finally, for a SORB endpoint estimation, we mention Li & Peng 2009.

7 SUE for Censored Data

Censoring and truncation are two of the most relevant concepts in statistics. Although conceptually different, with the first one related to a cutting in the support of the underlying model and the second one with a cutting of the sample, they are formally quite similar. Whenever working in the area of SUE, and if interested in inference for large values, we are dealing under a kind of censoring, left censoring, working only with the k top OSs. But apart from this, we can also have a random right censoring. Censoring occurs both in industrial life testing (i.e. investigation of the distribution of the lifetime of manufactured components) and in medical trials and biological experiments. So terms synonymous to a ‘censored observation’ are ‘withdrawal’, a ‘loss’ or a ‘death due to a competing risk’, while an ‘uncensored observation’ might be a ‘failure’, a ‘relapse’ or a ‘death from the cause under study’. Statistical techniques for analysing censored data sets are quite well studied now, but they mostly concern central characteristics of the underlying distribution. The framework of extreme-value analysis under censoring has not been extensively studied in the literature. To the best of our knowledge, the first who mentioned the topic are Beirlant et al. (1996a, Section 2.7) and Reiss & Thomas (1997, Section 6.1). Then some estimators of tail parameters have been proposed by Beirlant & Guillou 2001 for truncated data and extended to the random-right censoring by Beirlant et al., 2007 and Einmahl et al., 2008.

Under random censoring, there is an RV Y such that only Z = XYandδ = I{XY} are observed, with IA denoting the indicator function of the event A. The indicator variable δ determines whether X has been censored or not. Consequently, we have access to the random sample (Zi,δi),1≤in, of independent copies of (Z,δ), but our goal is to make inference on the RTF of the unknown lifetime distribution, that is, on math formula while FY, the CDF of Y, is considered to be a non-parametric nuisance parameter. As mentioned by Einmahl et al., 2008, all the EVI estimators need to be slightly modified in order to be consistent for the estimation of ξ. The following possible and simple modification is suggested in Einmahl et al., 2008 (also Beirlant et al., 2007)

display math

where math formula can be one of the EVI estimators defined in Section 5, but based on the observed sample (Z1,…,Zn),δ[1,n],…,δ[n,n] are the δ's corresponding to the math formula being the proportion of non-censored observations in the k-largest Z's. Another approach to derive estimators in the case of censoring is to adapt the likelihood to this context (Beirlant et al., 2010). For applications of this methodology to different sets of survival data as well as simulated data, with hints for the adequate EVI, high quantiles and right endpoint estimation of X, see also Gomes & Neves 2011.

8 Testing Issues

8.1 Statistical Choice of EV Models Under Parametric Frameworks

The Gumbel-type CDF, Λ = EV0, and the exponential-type CDF, GP0, with EVξ and GPξ given in (2.1) and (2.4), respectively, are favourites in SUE, essentially because of the simplicity of inference associated with these populations. Additionally, ξ = 0 can be regarded as a change point, because for ξ < 0, the data come from a CDF with a finite right endpoint and for ξ > 0 the right endpoint is infinite. Thus, any separation between EV models, with Λ playing a central and prominent position (called ‘trilemma’ by Tiago de Oliveira), turns out to be an important statistical problem, which has been recently considered under a semi-parametric framework. From a parametric point of view, empirical tests of the hypothesis H0:ξ = 0 versus a sensible one-sided or two-sided alternative, either for the EVD or for the GPD, date back to Jenkinson 1955 and Gumbel 1965. Next, we can find the following in the literature:

  • Quick tests, suggested by heuristic reasons (van Montfort, 1970, 1973; Bardsley, 1977; van Montfort & Otten, 1978; Otten & van Montfort, 1978; Galambos, 1982; Gomes, 1982, 1984; Tiago de Oliveira & Gomes, 1984; van Montfort & Gomes, 1985; van Monfort & Witter, 1985; Gomes & van Montfort, 1986; Brilhante, 2004)
  • Modified locally most powerful tests (Tiago de Oliveira, 1981, 1984; Tiago de Oliveira & Gomes, 1984; Gomes & van Montfort, 1986)
  • Locally asymptotically normal tests (Falk, 1995a, 1995b; Marohn, , 1998a, 1998b, 2000; Falk et al., 2008)
  • Goodness-of-fit (GOF) tests for the Gumbel model (Stephens, 1976, 1977, 1986; Kinnison, 1989). The fitting of the GPD to data has been worked out by Castillo & Hadi 1997 and Chaouche & Bacro 2004. The problem of GOF tests for the GPD has been studied by Choulakian & Stephens2001 and Luceño 2006, among others. Further non-parametric tests appear in Jurečková & Picek 2001
  • Tests from large sample theory, like the likelihood ratio test and Wald test, among others (Hosking, 1984; Gomes, 1989b)

Statistical choice in the MEV model is developed by Gomes 1984, 1987, 1989a, 1989b, Gomes & Alpuim 1986, Hasofer & Wang 1992, Wang 1995, Fraga Alves & Gomes 1996 and Wang et al., 1996. Some of these authors already go beyond the extremal process, working under a semi-parametric framework.

8.2 Semi-parametric Framework

Under a semi-parametric framework, it is naturally sensible to test the hypothesis

display math

or the corresponding one-sided alternatives. Tests of this nature can already be found in several papers prior to 2000, among which we mention Galambos 1982, Castillo et al., 1989, Hasofer & Wang 1992, Fraga Alves & Gomes 1996, Wang et al., 1996, Marohn 1998a, 1998b and Fraga Alves 1999. More recently, further testing procedures of this type can be found in Segers & Teugels 2000, Neves et al., 2006 and Neves & Fraga Alves 2007, among others. The testing of first-order EV conditions can be dated back to Dietrich et al., 2002, who propose a test statistic to check whether the hypothesis math formula is supported by the data, together with a simpler version devised to test whether math formula. Further results of this last nature can be found in Beirlant et al., 2006. Drees et al., 2006 deal with the testing of math formula. Accurate tables of critical points of this statistic are provided by Hüsler & Li 2006. See also Canto e Castro et al., 2011. Up-to-date reviews of the topic can be found in Neves & Fraga Alves 2008 and Hüsler & Peng 2008.

9 Dependent Frameworks—A Brief Reference to the Extremal Index Estimation

The same EVξ CDF, in (2.1), appears as the limiting CDF of the maximum for a large class of stationary sequences, {Xn}xx-xxn≥1, like the ones for which the mixing condition D, introduced by Leadbetter et al., 1983, holds. Let us assume we have data from a stationary process with an underlying CDF, F, and let {Yn}xx-xxn≥1 be the associated IID sequence (from the same model F). Under adequate local dependence conditions, the limiting CDF of the maximum Xn:n of the stationary sequence may be directly related to the maximum, Yn:n, of the IID associated sequence, through a new parameter, the so-called extremal index (EI).

More specifically, the stationary sequence {Xn}xx-xxn≥1 has an extremal index θ(0 < θ≤1) if, for every τ > 0, we may find a sequence of levels un=un(τ) such that

display math

The extremal indexθ may thus be informally defined by the approximation

display math

where F(·) is the marginal CDF of a strictly stationary sequence {Xn}xx-xxn≥1, satisfying adequate local and asymptotic dependence conditions. One of the local dependence conditions that enable us to guarantee the existence of an extremal index is the D′′ condition, introduced by Leadbetter & Nandagopalan 1989. Under the validity of such a condition, the EI can also be defined as the reciprocal of the ‘mean time of duration of extreme events’, being directly related to the exceedances of high levels. Indeed, we have

display math

where un is a sequence of values such that F(un) = 1−τ/n + o(1/n), as n. Then, given a sample (X1,…,Xn), an obvious non-parametric estimator of θ is immediately suggested: once a suitable threshold u is chosen, put

display math

In order to have consistency of this estimator, the high-level u = un must be such that n(1 − F(un)) = cnτ = τn, τn and τn/n→0 (Nandagopalan, 1990). And it is sensible to replace the deterministic level u by the stochastic level Xnk:n and to consider the EI estimator as a function of k, the number of OSs higher than the chosen threshold, obtaining the upcrossing (UC) EI estimator

display math(9.1)

Like this, we are thus placing ourselves in a situation similar to the one we have with the semi-parametric EVI estimation, with consistency attained only if k is intermediate. Reduced-bias versions of the UC EI estimator, in (9.1), have been obtained by Gomes et al., 2008d. Limit theorems for empirical processes of cluster functionals have been obtained by Drees & Rootzén 2010.

10 Other Related Topics and Open Problems

We shall next almost reproduce the discussion of Beirlant et al., 2012, as most of the topics considered there still deserve attention. Indeed, SUE is a quite lively topic of research. Important developments have appeared recently in the area of spatial extremes, where parametric models became again relevant. And now that we have access to highly sophisticated computational techniques, a great variety of parametric models can further be considered. And in a semi-parametric framework, topics like robustness and extremes, threshold selection, trends and change points in the tail behaviour and clustering, among others, are still quite challenging.

10.1 Penultimate Approximations

An important problem in EVT concerns the rate of convergence of Fn(anx + bn) towards EVξ(x), in (2.1), or, equivalently, the search for estimates of the difference

display math

Indeed, parametric inference on the RTF, usually unknown, is carried out on the basis of the identification of Fn(anx + bn) and of EVξ(x). And the rate of convergence can validate or not the most usual models in SUE. As noted by Fisher & Tippett 1928, despite the normal CDF, math formula, the convergence of Φn(anx + bn) towards G0(x) is extremely slow. They then show that Φn(x) is ‘closer’ to a suitable penultimate Weibull rather than the Gumbel CDF. Such an approximation is the so-called penultimate approximation, and several penultimate models have been advanced. Dated overviews of the modern theory of rates of convergence in EVT, introduced by Anderson 1971, can be seen in Galambos 1984 and Gomes 1994. More recently, Gomes & de Haan 1999 derived, for all math formula, exact penultimate approximation rates with respect to the variational distance, under adequate differentiability assumptions. Kaufmann 2000 proved, under weaker conditions, a result related to the one in Gomes & de Haan 1999. This penultimate or pre-asymptotic behaviour has further been studied by Raoult & Worms 2003 and Diebolt & Guillou 2005, among others. Quite recently, the role of MS penultimate approximations in reliability has been considered by Reis et al., 2013. Dealing with regular and homogeneous parallel–series systems, these authors assess the gain in accuracy when a penultimate approximation is used instead of the ultimate one. Other types of penultimate approximations have been considered in the unpublished paper by Smith 1987b. Among them, we mention a penultimate EV parametric model of the type

display math

or the associated penultimate GP CDF

display math

This type of model surely deserves a deeper consideration under statistical backgrounds. Penultimate models seem to be possible and interesting alternatives to the classical models but have never been deeply used in the literature. Shouldn't we invest more on this type of models from an inferential point of view?

10.2 Max-semistable Laws as Alternative Parametric Models

We also refer to the class of max-semistable (MSS) laws, introduced by Grinevigh 1992a, 1992b and Pancheva 1992 and further studied by Canto e Castro et al., 2002 and Temido & Canto e Castro 2003. Such a class is more general than the class of max-stable (MS) laws, given in (2.1). The possible MSS laws are

display math

where ν(·) is a positive, limited and periodic function. A unit ν-function enables us to obtain the MS laws in (2.1). Discrete models like the geometric and negative binomial, and some multimodal continuous models, are in math formula but not in math formula. A survey of the topic can be found in Pancheva 2010. Generalised Pickands statistics have been used by Canto e Castro & Dias 2011 to develop methods of estimation in the MSS context. See also Canto e Castro et al., 2011. Such a diversity of models, if duly exploited from a statistical point of view, can surely provide fruitful topics of research, in both parametric and semi-parametric set-ups.

10.3 Robustness and Extremes

In most statistical applications, outliers often occur and thus can have a disproportional effect on the estimation procedures. Some robust algorithms, replacing the classical statistical methodologies, can be useful, leading to alternative outlier-resistant estimators. In the context of EVT, this notion of robustness can appear at first sight as a contradiction because the aim is to reduce the influence of extreme observations whereas EVT mainly focuses on these data points (e.g. Dell'Aquila & Embrechts, 2006). However, tail parameters can be seriously affected by these suspicious observations, and thus, robust methods are required. Such a topic has been recently studied in the literature. We can mention among others Brazauskas & Serfling 2000 and Vandewalle et al., 2007 for strict Pareto-type distributions and Dupuis & Field 1998, Peng & Welsh 2001 and Juárez & Schucany 2004 for EVDs or GPDs. Different approaches can be used, for instance, the density power divergence criterion introduced by Basu et al., 1998 (Dierckx et al., 2013). Other methods based on an adaptation of classical tail parameters, such as Hill's, can be proposed (e.g. Beran et al., 2014). In all the cases, these estimators depend on a single parameter that controls the trade-off between robustness and efficiency. The choice of this parameter is delicate, and its adaptive selection, depending on the data, could be a challenging open problem. From a practical point of view, it is also difficult to be convinced of the presence or absence of outliers in a data set. Thus, it is recommended to compare robust and non-robust estimators. In case of a strong discrepancy between the estimates, we can suspect the presence of outliers in the tails.

10.4 The PORT Methodology

In SUE, most of the methods of estimation are dependent on the log-excesses or scaled log-spacings and do not react adequately to changes in the location of the model underlying the data. The steps given in PORT estimation of parameters of extreme events are promising but still quite incomplete at the current state of the art. For PORT quantile estimation, we mention Henriques-Rodrigues & Gomes 2009. The shift-invariant versions, dependent on the tuning parameter q in (5.2), have properties similar to the ones of the original estimator T, provided we keep to adequate k-values and choose an adequate tuning parameterq. Recent research on this topic can be seen in Gomes et al., 2013, 2011, but further theoretical research is welcome. The PORT estimation of a shape second-order parameter has been dealt with by Henriques-Rodrigues et al., 2013.

10.5 Adaptive Selection of Sample Fraction or Threshold

A threshold is often set ‘almost arbitrarily’ (for instance at the 90% or 95% sample quantile)! However, the choice of the threshold, or equivalently of the number k of top OSs to be used, is crucial for a reliable estimation of any parameter of extreme events. The topic has already been extensively studied for classical and even reduced-bias EVI estimators, as mentioned in Section 5.4. Is it sensible to use bootstrap computationally intensive procedures for threshold selection, or will there be simpler techniques possibly related to bias pattern? Is it possible to apply a similar methodology for the estimation of other parameters of extreme events?

10.6 Other Possible Topics of Research in SUE

Recent parametric models, like the EV Birnbaum–Saunders model in Ferreira et al., 2012, can also become relevant in the area of SUE. Testing whether math formula, for a certain ξ, is a crucial topic, already dealt with in several articles referred to in Section 8. And what about testing second-order and even third-order conditions? Change-point detection is also a challenging topic of research. And SUE for weakly dependent data, with all problems related to clustering of EVs, still deserves further research. SUE for randomly censored data and estimation of the endpoint are also still relevant topics in SUE. We mention the recent papers by Einmahl & Magnus 2008, Li & Peng 2009, Einmahl & Smeets 2011, Henriques-Rodrigues et al., 2011, Li et al., 2011 and Fraga Alves & Neves 2013. Moreover, the estimation of second-order and higher-order parameters still deserves further attention, particularly because of the importance of such estimation in SORB estimators of parameters of extreme events.


M. Ivette Gomes has been partially supported by National Funds through FCT—Fundação para a Ciência e a Tecnologia, project PEst-OE/MAT/UI0006/2014. We also would like to thank the reviewers and the Associate Editor for their valuable comments that improved a first version of this review.